  • 图像分类:识别图像中的内容——知道图像是什么
  • 物体识别检测:识别图像中的内容以及其位置——知道图像是什么在哪里(通过的是**边界框**)
  • 语义分割:识别图像中存在的内容以及位置(通过像素点






1.标准的语义分割(standard semantic segmentation)


2.实例感知语义分割(instance aware semantic segmentation)









​ 其实有点类似于GAN,首先是特征提取过程,就是通过卷积神经网络将一张图片变成特征图(类似于GAN的discriminator), 然后通过反卷积一张张特征图复现为原图(类似于GAN的generator),如果是GAN是G+D,那么语义分割就相当于**“D+G”**,而且不存在fake_data和real_data的说法。





from keras.models import *
from keras.layers import *
import keras.backend as K
import keras

IMAGE_ORDERING = 'channels_last'

def relu6(x):
    """Rectified linear unit.

        With default values, it returns element-wise `max(x, 0)`.

        # Arguments
            x: A tensor or variable.
            alpha: A scalar(标量), slope of negative section (default=`0.`).(负载面坡度,就是relu函数的右边部分)
            max_value: Saturation threshold.(饱和阈值)

        # Returns
            A tensor.
    return K.relu(x, max_value=6)

# convblock就是Resnet中的,有残差的结构,残差结构的作用就是减少了训练过程中的信息损失
def _conv_block(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1)):

    channel_axis = 1 if IMAGE_ORDERING == 'channels_first' else -1
    filters = int(filters * alpha)

        Zero-padding layer for 2D input (e.g. picture).

        This layer can add rows and columns of zeros
        at the top, bottom, left and right side of an image tensor.
    # convblock通过zeropadding来增加维度:
    # 根据zeropdding的函数文档我们知道,它用
    # 于2D数据,例如图片,可以在图片的上下左右
    # 增加0矩阵,以此来让一个图片”变胖“。

    # zeropadding-Conv2D-BN
    x = ZeroPadding2D(padding=(1, 1), name='conv1_pad', data_format=IMAGE_ORDERING)(inputs)
    x = Conv2D(filters, kernel, data_format=IMAGE_ORDERING, padding='valid', use_bias=False, strides=strides,name='conv1')(x)
    # BN在深度神经网络训练过程中使得每一层神经网络的输入保持相同分布的
    x = BatchNormalization(axis=channel_axis, name='conv1_bn')(x)
    return Activation(relu6, name='conv1_relu')(x)

def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,depth_multiplier=1, strides=(1, 1), block_id=1):

    channel_axis = 1 if IMAGE_ORDERING == 'channels_first' else -1
    pointwise_conv_filters = int(pointwise_conv_filters * alpha)

    x = ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING, name='conv_pad_%d' % block_id)(inputs)
    x = DepthwiseConv2D((3, 3), data_format=IMAGE_ORDERING, padding='valid', depth_multiplier=depth_multiplier, strides=strides, use_bias=False, name='conv_dw_%d' % block_id)(x)
    x = BatchNormalization(axis=channel_axis, name='conv_dw_%d_bn' % block_id)(x)
    x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)

    x = Conv2D(pointwise_conv_filters, (1, 1), data_format=IMAGE_ORDERING,
               strides=(1, 1),
               name='conv_pw_%d' % block_id)(x)
    x = BatchNormalization(axis=channel_axis, name='conv_pw_%d_bn' % block_id)(x)
    return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)

# 前面的搭建的块其实都是在为接下来搭建moblienet做准备,
# 这里的get_mobilenet_encoder就是结构图中的主干部分。
def get_mobilenet_encoder( input_height=224 ,  input_width=224 , pretrained='imagenet' ):

    alpha = 1.0
    depth_multiplier = 1
    dropout = 1e-3

    img_input = Input(shape=(input_height,input_width , 3 ))

    x = _conv_block(img_input, 32, alpha, strides=(2, 2))
    x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1)
    f1 = x

    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, strides=(2, 2), block_id=2)
    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3)
    f2 = x

    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, strides=(2, 2), block_id=4)
    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5)
    f3 = x

    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, strides=(2, 2), block_id=6)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11)
    f4 = x

    x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, strides=(2, 2), block_id=12)
    x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)
    f5 = x

    # 返回f1, f2 ,f3, f4, f5的作用就是为decoder过程提供更多的数据选择
    return img_input, [f1, f2, f3, f4, f5]

# 至此,编码部分已经完成,我们的一张张原图片成为了为decoder能生成”分类图“而准备的一张张特征图


from keras.models import *
from keras.layers import *
from encoder import get_mobilenet_encoder

IMAGE_ORDERING = 'channels_last'

# assert 语句的作用是:当条件表达
# 式的值为真时,该语句什么也不做,
# 程序正常运行;反之,若条件表达式
# 的值为假,则 assert 会抛出
# AssertionError 异常。
def segnet_decoder(f, n_classes, n_up=3):

    assert n_up >= 2

    o = f
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(512, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)
    # 进行一次UpSampling2D,此时hw变为原来的1/8
    # 52,52,512
    """Upsampling layer for 2D inputs.

        Repeats the rows and columns of the data
        by size[0] and size[1] respectively.
    # 上采样函数的作用类似于zeropadding,
    # 作用也是让一张张图片变”胖“,起到放大
    # 图片的作用。常见的形式:预定义插值式、
    # 反卷积、Sub-layer。
    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(256, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    # 进行一次UpSampling2D,此时hw变为原来的1/4
    # 104,104,256
    for _ in range(n_up - 2):
        o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
        o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
        o = (Conv2D(128, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
        o = (BatchNormalization())(o)

    # 进行一次UpSampling2D,此时hw变为原来的1/2
    # 208,208,128
    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(64, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    # 此时输出为h_input/2,w_input/2,nclasses
    o = Conv2D(n_classes, (3, 3), padding='same', data_format=IMAGE_ORDERING)(o)

    return o

def _segnet(n_classes, encoder, input_height=416, input_width=608, encoder_level=3):
    # encoder通过主干网络
    img_input, levels = encoder(input_height=input_height, input_width=input_width)

    # 获取hw压缩四次后的结果
    feat = levels[encoder_level]

    # 将特征传入segnet网络
    o = segnet_decoder(feat, n_classes, n_up=3)

    # 将结果进行reshape
    o = Reshape((int(input_height / 2) * int(input_width / 2), -1))(o)
    o = Softmax()(o)
    model = Model(img_input, o)

    return model

def mobilenet_segnet(n_classes, input_height=224, input_width=224, encoder_level=3):
    model = _segnet(n_classes, get_mobilenet_encoder, input_height=input_height, input_width=input_width,
    model.model_name = "mobilenet_segnet"
    return model

# 以上定义的三个函数彼此所属的关系,第一个函数构建模型
# 的基本框架,用于第二个函数;第二个函数对结果进行一定
# 处理;第三个函数对模型名字进行一个定义。主干为第一个
# 函数。


from decoder import mobilenet_segnet

model = mobilenet_segnet(2, input_height=416, input_width=416)


Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 416, 416, 3)       0         
conv1_pad (ZeroPadding2D)    (None, 418, 418, 3)       0         
conv1 (Conv2D)               (None, 208, 208, 32)      864       
conv1_bn (BatchNormalization (None, 208, 208, 32)      128       
conv1_relu (Activation)      (None, 208, 208, 32)      0         
conv_pad_1 (ZeroPadding2D)   (None, 210, 210, 32)      0         
conv_dw_1 (DepthwiseConv2D)  (None, 208, 208, 32)      288       
conv_dw_1_bn (BatchNormaliza (None, 208, 208, 32)      128       
conv_dw_1_relu (Activation)  (None, 208, 208, 32)      0         
conv_pw_1 (Conv2D)           (None, 208, 208, 64)      2048      
conv_pw_1_bn (BatchNormaliza (None, 208, 208, 64)      256       
conv_pw_1_relu (Activation)  (None, 208, 208, 64)      0         
conv_pad_2 (ZeroPadding2D)   (None, 210, 210, 64)      0         
conv_dw_2 (DepthwiseConv2D)  (None, 104, 104, 64)      576       
conv_dw_2_bn (BatchNormaliza (None, 104, 104, 64)      256       
conv_dw_2_relu (Activation)  (None, 104, 104, 64)      0         
conv_pw_2 (Conv2D)           (None, 104, 104, 128)     8192      
conv_pw_2_bn (BatchNormaliza (None, 104, 104, 128)     512       
conv_pw_2_relu (Activation)  (None, 104, 104, 128)     0         
conv_pad_3 (ZeroPadding2D)   (None, 106, 106, 128)     0         
conv_dw_3 (DepthwiseConv2D)  (None, 104, 104, 128)     1152      
conv_dw_3_bn (BatchNormaliza (None, 104, 104, 128)     512       
conv_dw_3_relu (Activation)  (None, 104, 104, 128)     0         
conv_pw_3 (Conv2D)           (None, 104, 104, 128)     16384     
conv_pw_3_bn (BatchNormaliza (None, 104, 104, 128)     512       
conv_pw_3_relu (Activation)  (None, 104, 104, 128)     0         
conv_pad_4 (ZeroPadding2D)   (None, 106, 106, 128)     0         
conv_dw_4 (DepthwiseConv2D)  (None, 52, 52, 128)       1152      
conv_dw_4_bn (BatchNormaliza (None, 52, 52, 128)       512       
conv_dw_4_relu (Activation)  (None, 52, 52, 128)       0         
conv_pw_4 (Conv2D)           (None, 52, 52, 256)       32768     
conv_pw_4_bn (BatchNormaliza (None, 52, 52, 256)       1024      
conv_pw_4_relu (Activation)  (None, 52, 52, 256)       0         
conv_pad_5 (ZeroPadding2D)   (None, 54, 54, 256)       0         
conv_dw_5 (DepthwiseConv2D)  (None, 52, 52, 256)       2304      
conv_dw_5_bn (BatchNormaliza (None, 52, 52, 256)       1024      
conv_dw_5_relu (Activation)  (None, 52, 52, 256)       0         
conv_pw_5 (Conv2D)           (None, 52, 52, 256)       65536     
conv_pw_5_bn (BatchNormaliza (None, 52, 52, 256)       1024      
conv_pw_5_relu (Activation)  (None, 52, 52, 256)       0         
conv_pad_6 (ZeroPadding2D)   (None, 54, 54, 256)       0         
conv_dw_6 (DepthwiseConv2D)  (None, 26, 26, 256)       2304      
conv_dw_6_bn (BatchNormaliza (None, 26, 26, 256)       1024      
conv_dw_6_relu (Activation)  (None, 26, 26, 256)       0         
conv_pw_6 (Conv2D)           (None, 26, 26, 512)       131072    
conv_pw_6_bn (BatchNormaliza (None, 26, 26, 512)       2048      
conv_pw_6_relu (Activation)  (None, 26, 26, 512)       0         
conv_pad_7 (ZeroPadding2D)   (None, 28, 28, 512)       0         
conv_dw_7 (DepthwiseConv2D)  (None, 26, 26, 512)       4608      
conv_dw_7_bn (BatchNormaliza (None, 26, 26, 512)       2048      
conv_dw_7_relu (Activation)  (None, 26, 26, 512)       0         
conv_pw_7 (Conv2D)           (None, 26, 26, 512)       262144    
conv_pw_7_bn (BatchNormaliza (None, 26, 26, 512)       2048      
conv_pw_7_relu (Activation)  (None, 26, 26, 512)       0         
conv_pad_8 (ZeroPadding2D)   (None, 28, 28, 512)       0         
conv_dw_8 (DepthwiseConv2D)  (None, 26, 26, 512)       4608      
conv_dw_8_bn (BatchNormaliza (None, 26, 26, 512)       2048      
conv_dw_8_relu (Activation)  (None, 26, 26, 512)       0         
conv_pw_8 (Conv2D)           (None, 26, 26, 512)       262144    
conv_pw_8_bn (BatchNormaliza (None, 26, 26, 512)       2048      
conv_pw_8_relu (Activation)  (None, 26, 26, 512)       0         
conv_pad_9 (ZeroPadding2D)   (None, 28, 28, 512)       0         
conv_dw_9 (DepthwiseConv2D)  (None, 26, 26, 512)       4608      
conv_dw_9_bn (BatchNormaliza (None, 26, 26, 512)       2048      
conv_dw_9_relu (Activation)  (None, 26, 26, 512)       0         
conv_pw_9 (Conv2D)           (None, 26, 26, 512)       262144    
conv_pw_9_bn (BatchNormaliza (None, 26, 26, 512)       2048      
conv_pw_9_relu (Activation)  (None, 26, 26, 512)       0         
conv_pad_10 (ZeroPadding2D)  (None, 28, 28, 512)       0         
conv_dw_10 (DepthwiseConv2D) (None, 26, 26, 512)       4608      
conv_dw_10_bn (BatchNormaliz (None, 26, 26, 512)       2048      
conv_dw_10_relu (Activation) (None, 26, 26, 512)       0         
conv_pw_10 (Conv2D)          (None, 26, 26, 512)       262144    
conv_pw_10_bn (BatchNormaliz (None, 26, 26, 512)       2048      
conv_pw_10_relu (Activation) (None, 26, 26, 512)       0         
conv_pad_11 (ZeroPadding2D)  (None, 28, 28, 512)       0         
conv_dw_11 (DepthwiseConv2D) (None, 26, 26, 512)       4608      
conv_dw_11_bn (BatchNormaliz (None, 26, 26, 512)       2048      
conv_dw_11_relu (Activation) (None, 26, 26, 512)       0         
conv_pw_11 (Conv2D)          (None, 26, 26, 512)       262144    
conv_pw_11_bn (BatchNormaliz (None, 26, 26, 512)       2048      
conv_pw_11_relu (Activation) (None, 26, 26, 512)       0         
zero_padding2d_1 (ZeroPaddin (None, 28, 28, 512)       0         
conv2d_1 (Conv2D)            (None, 26, 26, 512)       2359808   
batch_normalization_1 (Batch (None, 26, 26, 512)       2048      
up_sampling2d_1 (UpSampling2 (None, 52, 52, 512)       0         
zero_padding2d_2 (ZeroPaddin (None, 54, 54, 512)       0         
conv2d_2 (Conv2D)            (None, 52, 52, 256)       1179904   
batch_normalization_2 (Batch (None, 52, 52, 256)       1024      
up_sampling2d_2 (UpSampling2 (None, 104, 104, 256)     0         
zero_padding2d_3 (ZeroPaddin (None, 106, 106, 256)     0         
conv2d_3 (Conv2D)            (None, 104, 104, 128)     295040    
batch_normalization_3 (Batch (None, 104, 104, 128)     512       
up_sampling2d_3 (UpSampling2 (None, 208, 208, 128)     0         
zero_padding2d_4 (ZeroPaddin (None, 210, 210, 128)     0         
conv2d_4 (Conv2D)            (None, 208, 208, 64)      73792     
batch_normalization_4 (Batch (None, 208, 208, 64)      256       
conv2d_5 (Conv2D)            (None, 208, 208, 2)       1154      
reshape_1 (Reshape)          (None, 43264, 2)          0         
softmax_1 (Softmax)          (None, 43264, 2)          0         
Total params: 5,541,378
Trainable params: 5,524,738
Non-trainable params: 16,640

Process finished with exit code 0




​ 其实刚开始接触relu函数的时候,对它的理解就是——好用、好用、就用它。至于,为什么要用relu乃至所有激活函数,就算我看了许多文章,也没有一个清晰的解释,但在写了那么多次代码之后,我对其有了进一步的理解(虽然进展不大)。其实比较简单,就是如果我们将神经网络层很简单的一层一层连接起来,eg:

x1_1 = conv2D(........)(inputs)
x2_1 = maxpooling2D(......)(x1)

x1_2 = conv2D(........)(x2_1)
x2_2 = maxpooling2D(......)(x1_2)


x i − 1 = k x i x_{i-1}=kx_i xi−1​=kxi​
残 差 结 构 : y = x + H ( x ) ( x 是 某 一 层 的 输 入 或 者 是 中 间 过 程 , H ( x ) 是 某 一 层 的 输 出 ) 残差结构:\quad y=x+H(x)\quad(x是某一层的输入或者是中间过程,H(x)是某一层的输出) 残差结构:y=x+H(x)(x是某一层的输入或者是中间过程,H(x)是某一层的输出)




f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+e−x1​

  1. 求导简单
    f ′ ( x ) = f ( x ) ( 1 − f ( x ) ) f'(x)=f(x)(1-f(x)) f′(x)=f(x)(1−f(x))

  2. 定义域内处处可导

  3. 不是伪非线性函数
    f ( x ) ≈ x f(x)\approx x f(x)≈x

  4. 饱和激活函数

    lim ⁡ x → ∞ f ( x ) = 1 lim ⁡ x → − ∞ f ( x ) = 0 \lim_{x\to\infty} f(x)=1\\ \lim_{x\to-\infty}f(x)=0 x→∞lim​f(x)=1x→−∞lim​f(x)=0

  5. 函数是单调函数


  1. ​ 激活函数运算量大(包含幂的运算)

  2. 函数输出不关于原点对称,使得权重更新效率变低,同时这会导致后一层的神经元将得到上一层输出的非0均值的信号作为输入,随着网络的加深,会改变数据的原始分布(就需要BN来解决了)

  3. 由图像知道导数的取值范围[0,0.25],非常的小。在进行反向传播计算的时候就会乘上一个很小的值,如果网络层次过深,就会发生“梯度消失”的现象了,无法更新浅层网络的参数了。
    f ′ ( x ) → 0 f'(x)\to0 f′(x)→0



f ( x ) = e x − e − x e x + e − x f ′ ( x ) = 1 − ( f ( x ) ) 2 f(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}\\ f'(x)=1-(f(x))^2 f(x)=ex+e−xex−e−x​f′(x)=1−(f(x))2

  1. 饱和激活函数
  2. 不是”伪非线性激活函数“
  3. 单调函数
  4. 定义域为负无穷到正无穷,输出区间在(-1,1)之间


  1. 运算量大
  2. 不能从根本上解决梯度消失问题




f ( x ) = m a x ( 0 , x ) f(x)=max(0,x) f(x)=max(0,x)

  1. 非线性函数(单侧线性函数)
  2. 运算十分简单
  3. 不是”伪非线性函数“
  4. 右侧为单调函数


  1. 造成神经元的”死亡”(还没了解过)

变种——Leaky ReLU函数
f ( x ) = m a x ( α x , x ) f(x)=max(\alpha x,x) f(x)=max(αx,x)


​ 把一些输入映射为0-1之间的实数,并且归一化保证和为1


S i = e j ∑ j e j S_i=\frac{e^j}{\sum_je^j} Si​=∑j​ejej​


