标签:DeepLab conv 16 self v3 out 256 size
DeepLab-v3(86.9 mIOU)
论文地址:https://arxiv.org/pdf/1706.05587.pdf(Rethinking Atrous Convolution for Semantic Image Segmentation)
讲解文章:https://blog.csdn.net/qq_14845119/article/details/102942576
参考项目:https://github.com/fregu856/deeplabv3
一、模型
(一)空洞卷积
同v2版本
(二)Going deeper
(三)ASPP with BN ( batch normalization )
v3版本的ASPP相对于v2有了一些改进。
如上图所示,随着rate的变大,有效的卷积区域变得越来越少。在极端情况下,即rate = feature map size时,空洞卷积核的有效卷积区域只有1。为了解决这一问题,作者对ASPP进行了以下改进:
上图中黄色括号括起的部分就是改进之后的ASPP,对于输入的scores map,分别进行五个平行处理:①1×1卷积;②3×3的rate=6的空洞卷积;③3×3的rate=12的空洞卷积;④3×3的rate=18的空洞卷积;⑤全局平均池化+双线性插值上采样。五个操作的输出的尺寸是相同的,对于这五个输出在通道维度上进行concate;然后再进行1×1的卷积。
下面是得到原图大小1/16的scores map的例子:
class ASPP(nn.Module):
def __init__(self, num_classes):
super(ASPP, self).__init__()
self.conv_1x1_1 = nn.Conv2d(512, 256, kernel_size=1)
self.bn_conv_1x1_1 = nn.BatchNorm2d(256)
self.conv_3x3_1 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=6, dilation=6)
self.bn_conv_3x3_1 = nn.BatchNorm2d(256)
self.conv_3x3_2 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=12, dilation=12)
self.bn_conv_3x3_2 = nn.BatchNorm2d(256)
self.conv_3x3_3 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=18, dilation=18)
self.bn_conv_3x3_3 = nn.BatchNorm2d(256)
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.conv_1x1_2 = nn.Conv2d(512, 256, kernel_size=1)
self.bn_conv_1x1_2 = nn.BatchNorm2d(256)
self.conv_1x1_3 = nn.Conv2d(1280, 256, kernel_size=1) # (1280 = 5*256)
self.bn_conv_1x1_3 = nn.BatchNorm2d(256)
self.conv_1x1_4 = nn.Conv2d(256, num_classes, kernel_size=1)
def forward(self, feature_map):
# (feature_map has shape (batch_size, 512, h/16, w/16))
feature_map_h = feature_map.size()[2] # (== h/16)
feature_map_w = feature_map.size()[3] # (== w/16)
out_1x1 = F.relu(self.bn_conv_1x1_1(self.conv_1x1_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
out_3x3_1 = F.relu(self.bn_conv_3x3_1(self.conv_3x3_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
out_3x3_2 = F.relu(self.bn_conv_3x3_2(self.conv_3x3_2(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
out_3x3_3 = F.relu(self.bn_conv_3x3_3(self.conv_3x3_3(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
out_img = self.avg_pool(feature_map) # (shape: (batch_size, 512, 1, 1))
out_img = F.relu(self.bn_conv_1x1_2(self.conv_1x1_2(out_img))) # (shape: (batch_size, 256, 1, 1))
out_img = F.upsample(out_img, size=(feature_map_h, feature_map_w), mode="bilinear") # (shape: (batch_size, 256, h/16, w/16))
out = torch.cat([out_1x1, out_3x3_1, out_3x3_2, out_3x3_3, out_img], 1) # (shape: (batch_size, 1280, h/16, w/16))
out = F.relu(self.bn_conv_1x1_3(self.conv_1x1_3(out))) # (shape: (batch_size, 256, h/16, w/16))
out = self.conv_1x1_4(out) # (shape: (batch_size, num_classes, h/16, w/16))
return out
经过ASPP之后,再通过上线性插值上采样恢复到原图尺寸,就得到了最终的分割图。在v3中作者没有对scores map进行CRF处理。
总的模型过程比较简单,可以分成下面三步:
feature_map = self.resnet(x) # (shape: (batch_size, 512, h/16, w/16))
output = self.aspp(feature_map) # (shape: (batch_size, num_classes, h/16, w/16))
output = F.upsample(output, size=(h, w), mode="bilinear") # (shape: (batch_size, num_classes, h, w))
return output
二、实验
最高可以达到86.9mIOU
标签:DeepLab,conv,16,self,v3,out,256,size 来源: https://www.cnblogs.com/biandekeren-blog/p/15371984.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。