【學習筆記】MS COCO dataset
5 人贊了文章
寫在前面:
文章純屬個人學習筆記,如有侵權,請聯繫。
coco dataset:http://cocodataset.org/
MS於2014年發布的Microsoft COCO數據集,已成為圖像字幕的標準測試平台。
原來的數據集有20G左右的圖片和500M左右的標籤文件。標籤文件標記了每個segmentation+bounding box(即分割物+分割物的邊界)的精確坐標,其精度均為小數點後兩位。例如,一個目標分割物的標籤示意如下:
{"segmentation":[[392.87, 275.77, 402.24, 284.2, 382.54, 342.36, 375.99, 356.43, 372.23, 357.37, 372.23, 397.7, 383.48, 419.27,407.87, 439.91, 427.57, 389.25, 447.26, 346.11, 447.26, 328.29, 468.84, 290.77,472.59, 266.38], [429.44,465.23, 453.83, 473.67, 636.73, 474.61, 636.73, 392.07, 571.07, 364.88, 546.69,363.0]], "area": 28458.996150000003, "iscrowd": 0,"image_id": 503837, "bbox": [372.23, 266.38, 264.5,208.23], "category_id": 4, "id": 151109},
數據集以場景理解為目標,主要從複雜的日常場景中截取,圖像中的目標通過精確的segmentation(分割)進行位置的標定。
該數據集主要解決3個問題:目標檢測,目標之間的上下文關係,目標的2維上的精確定位。
2014版本的coco dataset包括82,783 個訓練圖像、40,504個驗證圖像以及40,775個測試圖像,270k的分割出來的人以及886k的分割出來的物體。
80類物體類別:
{ person # 1 vehicle 交通工具 #8 {bicycle car motorcycle airplane bus train truck boat} outdoor #5 {traffic light fire hydrant stop sign parking meter bench} animal #10 {bird cat dog horse sheep cow elephant bear zebra giraffe} accessory 飾品 #5 {backpack 背包 umbrella 雨傘 handbag 手提包 tie 領帶 suitcase 手提箱 } sports #10 {frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket } kitchen #7 {bottle wine glass cup fork knife spoon bowl } food #10 {banana apple sandwich orange broccoli carrot hot dog pizza donut cake } furniture 傢具 #6 {chair couch potted plant bed dining table toilet } electronic 電子產品 #6 {tv laptop mouse remote keyboard cell phone } appliance 家用電器 #5 {microwave oven toaster sink refrigerator } indoor #7 {book clock vase scissors teddy bear hair drier toothbrush }}
20種語義類別:
{ aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor}
COCO dataset數據的基本格式如下:
{ "info" : info, "images" : [image], "annotations" : [annotation], "licenses" : [license],}info { "year" : int, "version" : str, "description" : str, "contributor" : str, "url" : str, "date_created" : datetime,}image{ "id" : int, # 圖片id "width" : int, # 圖片寬 "height" : int, # 圖片高 "file_name" : str, # 圖片名 "license" : int, "flickr_url" : str, "coco_url" : str, # 圖片鏈接 "date_captured" : datetime, # 圖片標註時間}license{ "id" : int, "name" : str, "url" : str,}
其中annotation的形式有三種:
(1)實例標註形式:
annotation{ "id" : int, "image_id" : int, "category_id" : int, "segmentation" : RLE or [polygon], "area" : float, "bbox" : [x,y,width,height], "iscrowd" : 0 or 1,}categories[{ "id" : int, "name" : str, "supercategory" : str,}]
其中,
如果instance表示單個object,則iscrowd=0,segmentation=polygon; 單個object也可能需要多個polygons,比如occluded的情況下;
如果instance表示多個objecs的集合,則iscrowd=1,segmentation=RLE. iscrowd=1用於標註較多的objects,比如人群。
關注一下polygon形式和RLE形式
(2)關鍵點標註形式:
annotation{ "keypoints" : [x1,y1,v1,...], "num_keypoints" : int, "[cloned]" : ...,}categories[{ "keypoints" : [str], "skeleton" : [edge], "[cloned]" : ...,}]
關鍵點標註包括了物體標註的所有數據(比如 id, bbox, 等等),以及兩種額外屬性信息.
「keypoints」是長度為 3K 的數組,K是對某類定義的關鍵點總數,位置為[x,y],關鍵點可見性v.
如果關鍵點沒有標註信息,則關鍵點位置[x=y=0],可見性v=1;
如果關鍵點有標註信息,但不可見,則v=2.
如果關鍵點在物體segment內,則認為可見.
「num_keypoints」是物體所標註的關鍵點數(v>0). 對於物體較多,比如物體群或者小物體時,num_keypoints=0.
對於每個類別,categories結構體數據有兩種屬性:」keypoints」 和 「skeleton」.
「keypoints」 是長度為k的關鍵點名字元串;
「skeleton」 定義了關鍵點的連通性,主要是通過一組關鍵點邊緣隊列表的形式表示,用於可視化。
(3)圖片描述/說明標註形式:
annotation{ "id" : int, "image_id" : int, "caption" : str,}
圖片描述標註包含了圖片的主題信息. 每個主題描述了特定的圖片,每張圖片至少有5個主題.
以上三種類型的標註均使用JSON文件存儲,每種類型又包括了訓練和驗證,所以共6個JSON文件。
COCO數據集annotation內容:
如instances_train2014.json訓練集:
{"info": {"description": "This is stable 1.0 version of the 2014 MS COCO dataset.", "url": "http://mscoco.org", "version": "1.0", "year": 2014, "contributor": "Microsoft COCO group", "date_created": "2015-01-27 09:11:52.357475"}, "images": [{"license": 5, "file_name": "COCO_train2014_000000057870.jpg", "coco_url": "http://mscoco.org/images/57870", "height": 480, "width": 640, "date_captured": "2013-11-14 16:28:13", "flickr_url": "http://farm4.staticflickr.com/3153/2970773875_164f0c0b83_z.jpg", "id": 57870},# image_id {"license": 5, "file_name": "COCO_train2014_000000384029.jpg", "coco_url": "http://mscoco.org/images/384029", "height": 429, "width": 640, "date_captured": "2013-11-14 16:29:45", "flickr_url": "http://farm3.staticflickr.com/2422/3577229611_3a3235458a_z.jpg", "id": 384029}, {"license": 1, "file_name": "COCO_train2014_000000222016.jpg", "coco_url": "http://mscoco.org/images/222016", "height": 640, "width": 480, "date_captured": "2013-11-14 16:37:59", "flickr_url": "http://farm2.staticflickr.com/1431/1118526611_09172475e5_z.jpg", "id": 222016} {"license": 4, "file_name": "COCO_train2014_000000475546.jpg", "coco_url": "http://mscoco.org/images/475546", "height": 375, "width": 500, "date_captured": "2013-11-25 21:20:23", "flickr_url": "http://farm1.staticflickr.com/167/423175046_6cd9d0205a_z.jpg", "id": 475546}],"licenses": [{"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/", "id": 1, "name": "Attribution-NonCommercial-ShareAlike License"}, {"url": "http://creativecommons.org/licenses/by-nc/2.0/", "id": 2, "name": "Attribution-NonCommercial License"}, {"url": "http://creativecommons.org/licenses/by-nc-nd/2.0/", "id": 3, "name": "Attribution-NonCommercial-NoDerivs License"}, {"url": "http://creativecommons.org/licenses/by/2.0/", "id": 4, "name": "Attribution License"}, {"url": "http://creativecommons.org/licenses/by-sa/2.0/", "id": 5, "name": "Attribution-ShareAlike License"}, {"url": "http://creativecommons.org/licenses/by-nd/2.0/", "id": 6, "name": "Attribution-NoDerivs License"}, {"url": "http://flickr.com/commons/usage/", "id": 7, "name": "No known copyright restrictions"}, {"url": "http://www.usa.gov/copyright.shtml", "id": 8, "name": "United States Government Work"}], "annotations": [{"segmentation": [[312.29, 562.89, 402.25, 232.61, 560.32, 300.72, 571.89]], "area": 54652.9556, "iscrowd": 0, "image_id": 480023, "bbox": [116.95, 305.86, 285.3, 266.03], "category_id": 58, "id": 86}, #這個id表示annotation的id,因為每一個圖像有不止一個annotation,所以要對每一個annotation編號 {"segmentation": [[252.46, 208.17, 267.96, 210.11, 208.45]], "area": 421.47274999999996, "iscrowd": 0, "image_id": 50518, "bbox": [245.54, 208.17, 40.14, 19.1], "category_id": 58, "id": 89}, {"segmentation": [[349.66, 143.56, 344.19, 131.38, 352.94, 139.19, 355.13, 139.97, 354.5, 144.34]], "area": 292.12984999999935, "iscrowd": 0, "image_id": 497261, "bbox": [343.72, 112.63, 17.66, 31.71], "category_id": 1, "id": 2232195}, {"segmentation": {"counts": [69901, 4, 21, 2,470, 12, 468, 13, 467, 12, 468, 12, 468, 12, 469, 10, 471, 8, 474, 4, 73630], "size": [480, 640]}, "area": 2846, "iscrowd": 1, "image_id": 554752, "bbox": [145, 275, 341, 53], "category_id": 1, "id": 900100554752}, {"segmentation": {"counts": [70375, 8, 415, 12, 411, 391, 34, 391, 34, 391, 35, 149], "size": [425, 640]}, "area": 7298, "iscrowd": 1, "image_id": 350724, "bbox": [165, 216, 474, 152], "category_id": 62, "id": 906200350724}, {"segmentation": {"counts": [99015, 6, 352, 8, 349, 8, 75781], "size": [359, 640]}, "area": 6478, "iscrowd": 1, "image_id": 554743, "bbox": [275, 207, 153, 148], "category_id": 1, "id": 900100554743}, {"segmentation": {"counts": [97214, 1, 425, 4, 6531], "size": [427, 640]}, "area": 3489, "iscrowd": 1, "image_id": 95999, "bbox": [227, 260, 397, 82], "category_id": 1, "id": 900100095999}], "categories": [{"supercategory": "person", "id": 1, "name": "person"}, # 一共80類 {"supercategory": "vehicle", "id": 2, "name": "bicycle"}, {"supercategory": "vehicle", "id": 3, "name": "car"}, {"supercategory": "vehicle", "id": 4, "name": "motorcycle"}, {"supercategory": "vehicle", "id": 5, "name": "airplane"}, {"supercategory": "vehicle", "id": 6, "name": "bus"}, {"supercategory": "vehicle", "id": 7, "name": "train"}, {"supercategory": "vehicle", "id": 8, "name": "truck"}, {"supercategory": "vehicle", "id": 9, "name": "boat"}, {"supercategory": "outdoor", "id": 10, "name": "traffic light"}, {"supercategory": "outdoor", "id": 11, "name": "fire hydrant"}, {"supercategory": "outdoor", "id": 13, "name": "stop sign"}, {"supercategory": "outdoor", "id": 14, "name": "parking meter"}, {"supercategory": "outdoor", "id": 15, "name": "bench"}, {"supercategory": "animal", "id": 16, "name": "bird"}, {"supercategory": "animal", "id": 17, "name": "cat"}, {"supercategory": "animal", "id": 18, "name": "dog"}, {"supercategory": "animal", "id": 19, "name": "horse"}, {"supercategory": "animal", "id": 20, "name": "sheep"}, {"supercategory": "animal", "id": 21, "name": "cow"}, {"supercategory": "animal", "id": 22, "name": "elephant"}, {"supercategory": "animal", "id": 23, "name": "bear"}, {"supercategory": "animal", "id": 24, "name": "zebra"}, {"supercategory": "animal", "id": 25, "name": "giraffe"}, {"supercategory": "accessory", "id": 27, "name": "backpack"}, {"supercategory": "accessory", "id": 28, "name": "umbrella"}, {"supercategory": "accessory", "id": 31, "name": "handbag"}, {"supercategory": "accessory", "id": 32, "name": "tie"}, {"supercategory": "accessory", "id": 33, "name": "suitcase"}, {"supercategory": "sports", "id": 34, "name": "frisbee"}, {"supercategory": "sports", "id": 35, "name": "skis"}, {"supercategory": "sports", "id": 36, "name": "snowboard"}, {"supercategory": "sports", "id": 37, "name": "sports ball"}, {"supercategory": "sports", "id": 38, "name": "kite"}, {"supercategory": "sports", "id": 39, "name": "baseball bat"}, {"supercategory": "sports", "id": 40, "name": "baseball glove"}, {"supercategory": "sports", "id": 41, "name": "skateboard"}, {"supercategory": "sports", "id": 42, "name": "surfboard"}, {"supercategory": "sports", "id": 43, "name": "tennis racket"}, {"supercategory": "kitchen", "id": 44, "name": "bottle"}, {"supercategory": "kitchen", "id": 46, "name": "wine glass"}, {"supercategory": "kitchen", "id": 47, "name": "cup"}, {"supercategory": "kitchen", "id": 48, "name": "fork"}, {"supercategory": "kitchen", "id": 49, "name": "knife"}, {"supercategory": "kitchen", "id": 50, "name": "spoon"}, {"supercategory": "kitchen", "id": 51, "name": "bowl"}, {"supercategory": "food", "id": 52, "name": "banana"}, {"supercategory": "food", "id": 53, "name": "apple"}, {"supercategory": "food", "id": 54, "name": "sandwich"}, {"supercategory": "food", "id": 55, "name": "orange"}, {"supercategory": "food", "id": 56, "name": "broccoli"}, {"supercategory": "food", "id": 57, "name": "carrot"}, {"supercategory": "food", "id": 58, "name": "hot dog"}, {"supercategory": "food", "id": 59, "name": "pizza"}, {"supercategory": "food", "id": 60, "name": "donut"}, {"supercategory": "food", "id": 61, "name": "cake"}, {"supercategory": "furniture", "id": 62, "name": "chair"}, {"supercategory": "furniture", "id": 63, "name": "couch"}, {"supercategory": "furniture", "id": 64, "name": "potted plant"}, {"supercategory": "furniture", "id": 65, "name": "bed"}, {"supercategory": "furniture", "id": 67, "name": "dining table"}, {"supercategory": "furniture", "id": 70, "name": "toilet"}, {"supercategory": "electronic", "id": 72, "name": "tv"}, {"supercategory": "electronic", "id": 73, "name": "laptop"}, {"supercategory": "electronic", "id": 74, "name": "mouse"}, {"supercategory": "electronic", "id": 75, "name": "remote"}, {"supercategory": "electronic", "id": 76, "name": "keyboard"}, {"supercategory": "electronic", "id": 77, "name": "cell phone"}, {"supercategory": "appliance", "id": 78, "name": "microwave"}, {"supercategory": "appliance", "id": 79, "name": "oven"}, {"supercategory": "appliance", "id": 80, "name": "toaster"}, {"supercategory": "appliance", "id": 81, "name": "sink"}, {"supercategory": "appliance", "id": 82, "name": "refrigerator"}, {"supercategory": "indoor", "id": 84, "name": "book"}, {"supercategory": "indoor", "id": 85, "name": "clock"}, {"supercategory": "indoor", "id": 86, "name": "vase"}, {"supercategory": "indoor", "id": 87, "name": "scissors"}, {"supercategory": "indoor", "id": 88, "name": "teddy bear"}, {"supercategory": "indoor", "id": 89, "name": "hair drier"}, {"supercategory": "indoor", "id": 90, "name": "toothbrush"}]}
如image_info_test2014.json測試集:
{"info": {"description": "This is stable 1.0 version of the 2014 MS COCO dataset.", "url": "http://mscoco.org", "version": "1.0", "year": 2014, "contributor": "Microsoft COCO group", "date_created": "2015-11-11 02:11:36.777541"}, "images": [{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}, {"license": 2, "file_name": "COCO_test2014_000000347527.jpg", "coco_url": "http://mscoco.org/images/347527", "height": 480, "width": 640, "date_captured": "2013-11-14 15:12:02", "id": 347527}, {"license": 3, "file_name": "COCO_test2014_000000413171.jpg", "coco_url": "http://mscoco.org/images/413171", "height": 333, "width": 500, "date_captured": "2013-11-14 16:40:04", "id": 413171}, {"license": 3, "file_name": "COCO_test2014_000000102283.jpg", "coco_url": "http://mscoco.org/images/102283", "height": 640, "width": 425, "date_captured": "2013-11-14 16:45:58", "id": 102283}, {"license": 1, "file_name": "COCO_test2014_000000296903.jpg", "coco_url": "http://mscoco.org/images/296903", "height": 480, "width": 640, "date_captured": "2013-11-14 16:52:05", "id": 296903}, {"license": 2, "file_name": "COCO_test2014_000000540552.jpg", "coco_url": "http://mscoco.org/images/540552", "height": 500, "width": 333, "date_captured": "2013-11-14 17:01:31", "id": 540552}, {"license": 1, "file_name": "COCO_test2014_000000327534.jpg", "coco_url": "http://mscoco.org/images/327534", "height": 480, "width": 640, "date_captured": "2013-11-14 17:38:36", "id": 327534}, {"license": 3, "file_name": "COCO_test2014_000000155724.jpg", "coco_url": "http://mscoco.org/images/155724", "height": 640, "width": 480, "date_captured": "2013-11-25 21:24:27", "id": 155724}], "licenses": [{"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/", "id": 1, "name": "Attribution-NonCommercial-ShareAlike License"}, {"url": "http://creativecommons.org/licenses/by-nc/2.0/", "id": 2, "name": "Attribution-NonCommercial License"}, {"url": "http://creativecommons.org/licenses/by-nc-nd/2.0/", "id": 3, "name": "Attribution-NonCommercial-NoDerivs License"}, {"url": "http://creativecommons.org/licenses/by/2.0/", "id": 4, "name": "Attribution License"}, {"url": "http://creativecommons.org/licenses/by-sa/2.0/", "id": 5, "name": "Attribution-ShareAlike License"}, {"url": "http://creativecommons.org/licenses/by-nd/2.0/", "id": 6, "name": "Attribution-NoDerivs License"}, {"url": "http://flickr.com/commons/usage/", "id": 7, "name": "No known copyright restrictions"}, {"url": "http://www.usa.gov/copyright.shtml", "id": 8, "name": "United States Government Work"}], "categories": [{"supercategory": "person", "id": 1, "name": "person"}, {"supercategory": "vehicle", "id": 2, "name": "bicycle"}, {"supercategory": "vehicle", "id": 3, "name": "car"}, {"supercategory": "vehicle", "id": 4, "name": "motorcycle"}, {"supercategory": "vehicle", "id": 5, "name": "airplane"}, {"supercategory": "vehicle", "id": 6, "name": "bus"}, {"supercategory": "vehicle", "id": 7, "name": "train"}, {"supercategory": "vehicle", "id": 8, "name": "truck"}, {"supercategory": "vehicle", "id": 9, "name": "boat"}, {"supercategory": "outdoor", "id": 10, "name": "traffic light"}, {"supercategory": "outdoor", "id": 11, "name": "fire hydrant"}, {"supercategory": "outdoor", "id": 13, "name": "stop sign"}, {"supercategory": "outdoor", "id": 14, "name": "parking meter"}, {"supercategory": "outdoor", "id": 15, "name": "bench"}, {"supercategory": "animal", "id": 16, "name": "bird"}, {"supercategory": "animal", "id": 17, "name": "cat"}, {"supercategory": "animal", "id": 18, "name": "dog"}, {"supercategory": "animal", "id": 19, "name": "horse"}, {"supercategory": "animal", "id": 20, "name": "sheep"}, {"supercategory": "animal", "id": 21, "name": "cow"}, {"supercategory": "animal", "id": 22, "name": "elephant"}, {"supercategory": "animal", "id": 23, "name": "bear"}, {"supercategory": "animal", "id": 24, "name": "zebra"}, {"supercategory": "animal", "id": 25, "name": "giraffe"}, {"supercategory": "accessory", "id": 27, "name": "backpack"}, {"supercategory": "accessory", "id": 28, "name": "umbrella"}, {"supercategory": "accessory", "id": 31, "name": "handbag"}, {"supercategory": "accessory", "id": 32, "name": "tie"}, {"supercategory": "accessory", "id": 33, "name": "suitcase"}, {"supercategory": "sports", "id": 34, "name": "frisbee"}, {"supercategory": "sports", "id": 35, "name": "skis"}, {"supercategory": "sports", "id": 36, "name": "snowboard"}, {"supercategory": "sports", "id": 37, "name": "sports ball"}, {"supercategory": "sports", "id": 38, "name": "kite"}, {"supercategory": "sports", "id": 39, "name": "baseball bat"}, {"supercategory": "sports", "id": 40, "name": "baseball glove"}, {"supercategory": "sports", "id": 41, "name": "skateboard"}, {"supercategory": "sports", "id": 42, "name": "surfboard"}, {"supercategory": "sports", "id": 43, "name": "tennis racket"}, {"supercategory": "kitchen", "id": 44, "name": "bottle"}, {"supercategory": "kitchen", "id": 46, "name": "wine glass"}, {"supercategory": "kitchen", "id": 47, "name": "cup"}, {"supercategory": "kitchen", "id": 48, "name": "fork"}, {"supercategory": "kitchen", "id": 49, "name": "knife"}, {"supercategory": "kitchen", "id": 50, "name": "spoon"}, {"supercategory": "kitchen", "id": 51, "name": "bowl"}, {"supercategory": "food", "id": 52, "name": "banana"}, {"supercategory": "food", "id": 53, "name": "apple"}, {"supercategory": "food", "id": 54, "name": "sandwich"}, {"supercategory": "food", "id": 55, "name": "orange"}, {"supercategory": "food", "id": 56, "name": "broccoli"}, {"supercategory": "food", "id": 57, "name": "carrot"}, {"supercategory": "food", "id": 58, "name": "hot dog"}, {"supercategory": "food", "id": 59, "name": "pizza"}, {"supercategory": "food", "id": 60, "name": "donut"}, {"supercategory": "food", "id": 61, "name": "cake"}, {"supercategory": "furniture", "id": 62, "name": "chair"}, {"supercategory": "furniture", "id": 63, "name": "couch"}, {"supercategory": "furniture", "id": 64, "name": "potted plant"}, {"supercategory": "furniture", "id": 65, "name": "bed"}, {"supercategory": "furniture", "id": 67, "name": "dining table"}, {"supercategory": "furniture", "id": 70, "name": "toilet"}, {"supercategory": "electronic", "id": 72, "name": "tv"}, {"supercategory": "electronic", "id": 73, "name": "laptop"}, {"supercategory": "electronic", "id": 74, "name": "mouse"}, {"supercategory": "electronic", "id": 75, "name": "remote"}, {"supercategory": "electronic", "id": 76, "name": "keyboard"}, {"supercategory": "electronic", "id": 77, "name": "cell phone"}, {"supercategory": "appliance", "id": 78, "name": "microwave"}, {"supercategory": "appliance", "id": 79, "name": "oven"}, {"supercategory": "appliance", "id": 80, "name": "toaster"}, {"supercategory": "appliance", "id": 81, "name": "sink"}, {"supercategory": "appliance", "id": 82, "name": "refrigerator"}, {"supercategory": "indoor", "id": 84, "name": "book"}, {"supercategory": "indoor", "id": 85, "name": "clock"}, {"supercategory": "indoor", "id": 86, "name": "vase"}, {"supercategory": "indoor", "id": 87, "name": "scissors"}, {"supercategory": "indoor", "id": 88, "name": "teddy bear"}, {"supercategory": "indoor", "id": 89, "name": "hair drier"}, {"supercategory": "indoor", "id": 90, "name": "toothbrush"}]}
cs231n的COCO
在cs231n的assignment3中,RNN_Captioning對COCO數據做了進一步處理。
如果數據沒有做預處理,則可以採用CNN來提取特徵。
預處理後的COCO數據集包括以下幾個文件:

其中train2014_urls.txt 和 val2014_urls.txt分別存儲了訓練集和驗證集的圖片URL, coco2014_vocab.json文件中存儲了整數型ID和辭彙之間的映射關係(每個辭彙對應一個整數ID)。
數據載入:
# Load COCO data from disk; this returns a dictionary# Well work with dimensionality-reduced features for this notebook, but feel# free to experiment with the original features by changing the flag below.data = load_coco_data(pca_features=True)# Print out all the keys and values from the data dictionaryfor k, v in data.items(): if type(v) == np.ndarray: print(k, type(v), v.shape, v.dtype) else: print(k, type(v), len(v))
運行結果:
val_image_idxs <class numpy.ndarray> (195954,) int32
word_to_idx <class dict> 1004val_urls <class numpy.ndarray> (40504,) <U63train_features <class numpy.ndarray> (82783, 512) float32idx_to_word <class list> 1004train_urls <class numpy.ndarray> (82783,) <U63val_captions <class numpy.ndarray> (195954, 17) int32
train_image_idxs <class numpy.ndarray> (400135,) int32train_captions <class numpy.ndarray> (400135, 17) int32val_features <class numpy.ndarray> (40504, 512) float32運行結果中顯示的是data中所包含的數據信息。
val_captions 和train_captions 存儲的是圖片的文字說明。
train_features 和val_features 存儲的是圖片的特徵向量。
idx_to_word 存儲的是ID與辭彙之間的映射關係。
數據的可視化:
# Sample a minibatch and show the images and captionsbatch_size = 3captions, features, urls = sample_coco_minibatch(data, batch_size=batch_size)for i, (caption, url) in enumerate(zip(captions, urls)): ##enumerate()表示枚舉,同時獲得索引和值 plt.imshow(image_from_url(url)) plt.axis(off) caption_str = decode_captions(caption, data[idx_to_word]) plt.title(caption_str) plt.show()
##enumerate()的使用,見python enumerate用法總結
運行結果:

對COCO數據進行處理的文件為coco_utils.py, 代碼如下:
from builtins import rangeimport os, jsonimport numpy as npimport h5pyBASE_DIR = cs231n/datasets/coco_captioning#select which file will be chosendef load_coco_data(base_dir=BASE_DIR, max_train=None, pca_features=True): data = {} caption_file = os.path.join(base_dir, coco2014_captions.h5) with h5py.File(caption_file, r) as f: for k, v in f.items(): data[k] = np.asarray(v) if pca_features: train_feat_file = os.path.join(base_dir, train2014_vgg16_fc7_pca.h5) else: train_feat_file = os.path.join(base_dir, train2014_vgg16_fc7.h5) with h5py.File(train_feat_file, r) as f: data[train_features] = np.asarray(f[features]) if pca_features: val_feat_file = os.path.join(base_dir, val2014_vgg16_fc7_pca.h5) else: val_feat_file = os.path.join(base_dir, val2014_vgg16_fc7.h5) with h5py.File(val_feat_file, r) as f: data[val_features] = np.asarray(f[features]) dict_file = os.path.join(base_dir, coco2014_vocab.json) with open(dict_file, r) as f: dict_data = json.load(f) for k, v in dict_data.items(): data[k] = v train_url_file = os.path.join(base_dir, train2014_urls.txt) with open(train_url_file, r) as f: train_urls = np.asarray([line.strip() for line in f]) data[train_urls] = train_urls val_url_file = os.path.join(base_dir, val2014_urls.txt) with open(val_url_file, r) as f: val_urls = np.asarray([line.strip() for line in f]) data[val_urls] = val_urls # Maybe subsample the training data if max_train is not None: num_train = data[train_captions].shape[0] mask = np.random.randint(num_train, size=max_train) data[train_captions] = data[train_captions][mask] data[train_image_idxs] = data[train_image_idxs][mask] return data# convert numpy arrays of integer IDs back into strings.def decode_captions(captions, idx_to_word): singleton = False if captions.ndim == 1: singleton = True captions = captions[None] decoded = [] N, T = captions.shape for i in range(N): words = [] for t in range(T): word = idx_to_word[captions[i, t]] if word != <NULL>: words.append(word) if word == <END>: break decoded.append( .join(words)) if singleton: decoded = decoded[0] return decoded##sample a small minibatch of training data and show the images and their captions.def sample_coco_minibatch(data, batch_size=100, split=train): split_size = data[%s_captions % split].shape[0] mask = np.random.choice(split_size, batch_size) captions = data[%s_captions % split][mask] image_idxs = data[%s_image_idxs % split][mask] image_features = data[%s_features % split][image_idxs] urls = data[%s_urls % split][image_idxs] return captions, image_features, urls
熟悉python對文件的操作。
推薦閱讀:
※OpenCV數學形態學
※OpenCV3學習筆記整理
※SubCAM在相差比較大的兩數之間插值
※用卷積神經網路來合成大師般的作品 - Style Transfer
※skimage例子學習(二)將灰度變換的濾波器應用到RGB圖像
TAG:RNN | 圖像處理 | 深度學習DeepLearning |
