ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

MS COCO segmentation编码存储(RLE&polygon)

2019-03-27 15:41:35  阅读:662  来源: 互联网

标签:segmentation 15 polygon img RLE str coco id


读coco数据集的代码接口了解segmentation的处理方法

COCO数据集是微软团队制作的一个数据集,通过这个数据集我们可以训练到神经网络对图像进行detection,classification,segmentation,captioning。具体介绍请祥见官网。

  • annotation格式介绍
  • mask存储处理方式简单介绍
  • 相关代码分析
  • 一个实例

annotation格式介绍

//从官网拷贝下来的
{
    "info": info,
    "images": [image],
    "annotations": [annotation],
    "licenses": [license],
}

info{
    "year": int,
    "version": str,
    "description": str,
    "contributor": str,
    "url": str,
    "date_created": datetime,
}

image{
    "id": int,
    "width": int,
    "height": int,
    "file_name": str,
    "license": int,
    "flickr_url": str,
    "coco_url": str,
    "date_captured": datetime,
}

license{
    "id": int,
    "name": str,
    "url": str,
}
----------

    Object Instance Annotations

    Each instance annotation contains a series of fields, including the category id and segmentation mask of the object. The segmentation format depends on whether the instance represents a single object (iscrowd=0 in which case polygons are used) or a collection of objects (iscrowd=1 in which case RLE is used). Note that a single object (iscrowd=0) may require multiple polygons, for example if occluded. Crowd annotations (iscrowd=1) are used to label large groups of objects (e.g. a crowd of people). In addition, an enclosing bounding box is provided for each object (box coordinates are measured from the top left image corner and are 0-indexed). Finally, the categories field of the annotation structure stores the mapping of category id to category and supercategory names.

    中文翻译如下: 每个实例注释包含一系列字段,这些字段有category id和segmentation mask。segementation字段的格式取决于实例是代表单个物体(具体来说iscrowd=0,这时候就会用到polygon,也就是多边形)还是目标的集合体(此时iscrowd=1, 会用到RLE,后面解释这个的意思)。注意到单个目标可能需要多个多边形来表示,例如在被遮挡的情况下。群体注释是用来标注目标的集合体(例如一群人)。除此之外,每个目标都会有一个封闭的外接矩形框来标记(矩形框的坐标从图像的左上角开始记录,没有索引)。最后,类别字段存储着category id到category和父级category名字的映射。

    
    
    annotation{
        "id": int,
        "image_id": int,
        "category_id": int,
        "segmentation": RLE or [polygon],
        "area": float,
        "bbox": [x,y,width,height],
        "iscrowd": 0 or 1,
    }
    
    categories[{
        "id": int,
        "name": str,
        "supercategory": str,
    }]
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    mask存储处理方式简单介绍

    上面提到coco数据集使用了两种方式进行mask存储,一是polygon,一是RLE。polygon比较好理解,就是多边形嘛!RLE是什么呢?

    简单点来讲,RLE是一种压缩方法,也是最容易想到的压缩方式。

    举个例子:M = [0,0,0,1,1,1,1,1,1,0,0],则M的RLE编码为[3,6,2],当然这是针对二进制进行的编码,也是coco里面采用的。RLE远不止这样简单,我们这里并不着重讲RLE,请百度吧。

    代码中注释说的

    # RLE is a simple yet efficient format for storing binary masks. RLE
    # first divides a vector (or vectorized image) into a series of piecewise
    # constant regions and then for each piece simply stores the length of
    # that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
    # be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
    # (note that the odd counts are always the numbers of zeros). Instead of
    # storing the counts directly, additional compression is achieved with a
    # variable bitrate representation based on a common scheme called LEB128.
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    解释一下就是:RLE将一个二进制向量分成一系列固定长度的片段,对每个片段只存储那个片段的长度。例如M=[0 0 1 1 1 0 1], RLE就是[2 3 1 1];M=[1 1 1 1 1 1 0], RLE为[0 6 1],注意奇数位始终为0的个数。另外,也使用一个基于LEB128的通用方案的可变比特率来完成额外的压缩。

    相关代码分析

    COCO是官方给出的一个api接口,具体来说是一个python和C编写的工具代码。mask相关内容是用c编写的。
    
    //代码来源于FastMaskRCNN
    
    **1.convert img and annotation to TFRecord**
    //加载标注文件
    coco = COCO(annFile)
    //加载类别信息
    cats = coco.loadCats(coco.getCatIds())
    print ('%s has %d images' %(split_name, len(coco.imgs)))
    //将img信息转存
    imgs = [(img_id, coco.imgs[img_id]) for img_id in coco.imgs]
    //获取分片信息
    num_shards = int(len(imgs) / 2500)
    num_per_shard = int(math.ceil(len(imgs) / float(num_shards)))
    
    2.获取coco中的mask,bbox信息
    def _get_coco_masks(coco, img_id, height, width, img_name):
      """ get the masks for all the instances
      Note: some images are not annotated
      Return:
        masks, mxhxw numpy array
        classes, mx1
        bboxes, mx4
      """
      annIds = coco.getAnnIds(imgIds=[img_id], iscrowd=None)
      # assert  annIds is not None and annIds > 0, 'No annotaion for %s' % str(img_id)
      anns = coco.loadAnns(annIds)
      coco.showAnns(anns)
      # assert len(anns) > 0, 'No annotaion for %s' % str(img_id)
      masks = []
      classes = []
      bboxes = []
      mask = np.zeros((height, width), dtype=np.float32)
      segmentations = []
      for ann in anns:
        m = coco.annToMask(ann) # zero one mask
        assert m.shape[0] == height and m.shape[1] == width, \
                'image %s and ann %s dont match' % (img_id, ann)
        masks.append(m)
        cat_id = _cat_id_to_real_id(ann['category_id'])
        classes.append(cat_id)
        bboxes.append(ann['bbox'])
        m = m.astype(np.float32) * cat_id
        mask[m > 0] = m[m > 0]
    
      masks = np.asarray(masks)
      classes = np.asarray(classes)
      bboxes = np.asarray(bboxes)
      # to x1, y1, x2, y2
      if bboxes.shape[0] <= 0:
        bboxes = np.zeros([0, 4], dtype=np.float32)
        classes = np.zeros([0], dtype=np.float32)
        print ('None Annotations %s' % img_name)
        LOG('None Annotations %s' % img_name)
      bboxes[:, 2] = bboxes[:, 0] + bboxes[:, 2]
      bboxes[:, 3] = bboxes[:, 1] + bboxes[:, 3]
      gt_boxes = np.hstack((bboxes, classes[:, np.newaxis]))
      gt_boxes = gt_boxes.astype(np.float32)
      masks = masks.astype(np.uint8)
      mask = mask.astype(np.uint8)
      assert masks.shape[0] == gt_boxes.shape[0], 'Shape Error'
    
      return gt_boxes, masks, mask
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64

    一个实例

    # get all images containing given categories, select one at random
    catIds = coco.getCatIds(catNms=['animal']);
    imgIds = coco.getImgIds(catIds=catIds );
    imgIds = coco.getImgIds(imgIds = [324139])
    img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]
    
    
    print imgIds
    print img['coco_url']
    I = io.imread(img['coco_url'])
    plt.axis('off')
    plt.imshow(I)
    plt.show()  //图一
    
    
    
    plt.imshow(I); plt.axis('off')
    annIds = coco.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
    anns = coco.loadAnns(annIds)
    print len(anns)
    masks = []
    showonce = True
    for ann in anns:
        if type(ann['segmentation']) == list and showonce:
            print ann['segmentation']
            showonce = False
        if type(ann['segmentation']) != list:
            print ann['segmentation']
        m = coco.annToMask(ann)
        masks.append(m)
    print len(masks)
    coco.showAnns(anns) //图二
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32

    15
    [[151.06, 113.6, 168.95, 102.49, 182.53, 92.62, 193.64, 80.28, 203.51, 70.4, 208.45, 61.76, 206.6, 53.74, 209.68, 49.42, 220.17, 50.04, 220.79, 55.59, 222.64, 59.3, 222.64, 59.91, 227.58, 66.7, 228.2, 77.19, 228.2, 83.98, 228.2, 87.06, 228.2, 92.0, 227.58, 96.32, 220.79, 101.87, 213.39, 104.96, 205.36, 111.75, 202.9, 113.6, 201.04, 114.22, 202.9, 123.47, 200.43, 129.64, 200.43, 125.94, 198.58, 113.6, 190.55, 111.75, 181.91, 113.6, 168.95, 114.83, 168.95, 114.83, 162.17, 120.39, 157.23, 119.15, 146.74, 117.92, 142.42, 115.45]]
    {u’counts’: [113441, 1, 423, 6, 427, 7, 3, 1, 422, 8, 3, 1, 421, 9, 2, 2, 420, 10, 2, 1, 421, 10, 1, 2, 419, 12, 1, 2, 418, 15, 419, 15, 418, 16, 418, 16, 418, 15, 419, 15, 418, 16, 417, 16, 418, 16, 418, 15, 419, 14, 419, 14, 419, 14, 420, 13, 420, 13, 421, 11, 422, 11, 423, 10, 423, 10, 424, 9, 425, 8, 426, 7, 427, 6, 427, 5, 429, 5, 429, 5, 429, 5, 429, 5, 428, 5, 429, 5, 428, 6, 428, 6, 428, 6, 428, 5, 429, 5, 429, 5, 408, 7, 14, 5, 407, 9, 13, 5, 406, 11, 11, 5, 407, 12, 9, 6, 406, 13, 9, 6, 404, 15, 8, 7, 402, 18, 7, 7, 401, 20, 6, 6, 401, 21, 6, 6, 400, 34, 399, 35, 399, 12, 1, 22, 398, 13, 3, 19, 399, 12, 6, 17, 399, 12, 8, 15, 399, 12, 10, 13, 400, 10, 13, 10, 401, 10, 15, 8, 400, 11, 17, 5, 401, 10, 20, 2, 402, 11, 423, 11, 423, 11, 423, 11, 423, 12, 422, 13, 422, 13, 421, 14, 421, 14, 420, 15, 420, 14, 420, 15, 420, 15, 419, 16, 419, 16, 418, 17, 418, 17, 418, 17, 418, 17, 417, 18, 4, 4, 409, 24, 411, 23, 412, 22, 412, 22, 411, 23, 411, 22, 412, 22, 412, 22, 413, 21, 413, 20, 414, 20, 414, 19, 416, 18, 416, 17, 417, 17, 417, 17, 417, 16, 419, 14, 420, 14, 421, 13, 422, 11, 424, 10, 425, 8, 428, 4, 112407, 8, 422, 16, 416, 19, 414, 21, 2, 6, 405, 30, 403, 32, 402, 33, 401, 34, 400, 35, 400, 34, 400, 34, 401, 33, 402, 32, 277], u’size’: [434, 640]}
    15


    这里写图片描述
    图一
    这里写图片描述
    图二

    标签:segmentation,15,polygon,img,RLE,str,coco,id
    来源: https://www.cnblogs.com/leebxo/p/10607955.html

    本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
    2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
    3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
    4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
    5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

    专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

    Copyright (C)ICode9.com, All Rights Reserved.

    ICode9版权所有