PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

2021-09-22 21:03:48 阅读：231 来源： 互联网

标签：GANS sched PROGRESSIVE lod res minibatch IMPROVED tick kimg

文章目录

前言
一、PGGAN
二、使用步骤
- 1.网络结构
- 2.训练过程
总结

前言

1.activation：生成高分辨率图像很困难，因为更高的分辨率使得更容易将生成的图像与训练图像区分开来，从而大大放大了梯度问题。PGGAN的主要观点是：逐步增加发生器和鉴别器，从更容易的低分辨率图像开始，并添加新的层，随着训练的进行引入更高分辨率的细节。

1.activation：The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly ﬁne details as training progresses.逐步增加发生器和鉴别器：从低分辨率开始，我们添加新的层，随着训练的进展模拟越来越多的细节。
2.

提示：以下是本篇文章正文内容，下面案例可供参考

一、PGGAN

示例：pandas 是基于NumPy 的一种工具，该工具是为了解决数据分析任务而创建的。

二、使用步骤

1.网络结构

生成G的网络结构：

def G_paper(
    latents_in,                         # First input: Latent vectors [minibatch, latent_size].
    labels_in,                          # Second input: Labels [minibatch, label_size].
    num_channels        = 1,            # Number of output color channels. Overridden based on dataset.
    resolution          = 32,           # Output resolution. Overridden based on dataset.
    label_size          = 0,            # Dimensionality of the labels, 0 if no labels. Overridden based on dataset.
    fmap_base           = 8192,         # Overall multiplier for the number of feature maps.
    fmap_decay          = 1.0,          # log2 feature map reduction when doubling the resolution.
    fmap_max            = 512,          # Maximum number of feature maps in any layer.
    latent_size         = None,         # Dimensionality of the latent vectors. None = min(fmap_base, fmap_max).
    normalize_latents   = True,         # Normalize latent vectors before feeding them to the network?
    use_wscale          = True,         # Enable equalized learning rate?
    use_pixelnorm       = True,         # Enable pixelwise feature vector normalization?
    pixelnorm_epsilon   = 1e-8,         # Constant epsilon for pixelwise feature vector normalization.
    use_leakyrelu       = True,         # True = leaky ReLU, False = ReLU.
    dtype               = 'float32',    # Data type to use for activations and outputs.
    fused_scale         = True,         # True = use fused upscale2d + conv2d, False = separate upscale2d layers.
    structure           = None,         # 'linear' = human-readable, 'recursive' = efficient, None = select automatically.
    is_template_graph   = False,        # True = template graph constructed by the Network class, False = actual evaluation.
    **kwargs):                          # Ignore unrecognized keyword args.

生成器G主要由block(x, res)和torgb(x, res)构建。
当训练刚开始时，图像分辨率是4x4，此时的网络结构：

def block(x, res): # res = 2..resolution_log2
    with tf.variable_scope('%dx%d' % (2**res, 2**res)):
         if res == 2: # 4x4
            if normalize_latents: x = pixel_norm(x, epsilon=pixelnorm_epsilon)
             with tf.variable_scope('Dense'):
                  x = dense(x, fmaps=nf(res-1)*16, gain=np.sqrt(2)/4, use_wscale=use_wscale) # override gain to match the original Theano implementation
                  x = tf.reshape(x, [-1, nf(res-1), 4, 4])
                  x = PN(act(apply_bias(x)))
              with tf.variable_scope('Conv'):
                    x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
         ......

当图像分辨率大于等于8x8时（例如分辨率=8x8）：

def block(x, res): # res = 2..resolution_log2
    with tf.variable_scope('%dx%d' % (2**res, 2**res)):
    ......
         else: # 8x8 and up
           if fused_scale:
              with tf.variable_scope('Conv0_up'):
                   x = PN(act(apply_bias(upscale2d_conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
            else:
                x = upscale2d(x)
                with tf.variable_scope('Conv0'):
                     x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
                with tf.variable_scope('Conv1'):
                    x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
......

torgb层结构如下：（假设此时生成图像分辨率为8x8）

def torgb(x, res): # res = 2..resolution_log2
        lod = resolution_log2 - res
        with tf.variable_scope('ToRGB_lod%d' % lod):
            return apply_bias(conv2d(x, fmaps=num_channels, kernel=1, gain=1, use_wscale=use_wscale))

网络主体结构通过递归实现，其中渐进式增长训练通过改变lod_in实现，lod_in由提供。

graph TD
subgraph ide0 ["cru_nimg = 2100k，lod_in = 6.5"]
A(Input: Nx512) 
A-->B(Dense)
subgraph ide01["block: res=2,lod=8"]
B-->D(Reshape)
D-->F(Conv2d)
end
F-->G(Nx512x4x4)

G-->H(upscale2d_conv2d)
subgraph ide02["block: res=3,lod=7"]
H-->J(Conv2d)
end
J-->K(Nx512x8x8)

K-->Q(upscale2d_conv2d)
subgraph ide04["block: res=4,lod=6"]
Q-->R(Conv2d)
end
R-->S(Nx512x16x16)


K-->L(upscale2d_conv2d)
subgraph ide03["block: res=4,lod=6"]
L-->M(Conv2d)
end
M-->N(Nx512x16x16)
N -->O(Conv2d)
subgraph ide04 ["torgb：res=4,lod=6"]
O-->P("Output: Nx3x16x16")
style A fill:#FFFAFA, stroke:#FFFAFA
style C fill:#FFFAFA, stroke:#FFFAFA


end


subgraph ide1 ["cru_nimg = 1500k，lod_in = 7"]
a(Input:Nx512)
a -->b(Dense)
subgraph ide11["block: res=2,lod=8"]
b-->d(Reshape)
d-->f(Conv2d)
end
f-->g(Nx512x4x4)
g-->h(upscale2d_conv2d)
subgraph ide12["block: res=3,lod=7"]
h-->i(conv2d)
end
i-->j(Nx512x8x8)
end
style a fill:#FFFAFA, stroke:#FFFAFA
style g fill:#FFFAFA, stroke:#FFFAFA
style j fill:#FFFAFA, stroke:#FFFAFA

    if structure == 'recursive':
        def grow(x, res, lod):
            y = block(x, res)
            img = lambda: upscale2d(torgb(y, res), 2**lod)
            if res > 2: img = cset(img, (lod_in > lod), lambda: upscale2d(lerp(torgb(y, res), upscale2d(torgb(x, res - 1)), lod_in - lod), 2**lod))
            if lod > 0: img = cset(img, (lod_in < lod), lambda: grow(y, res + 1, lod - 1))
            return img()
        images_out = grow(combo_in, 2, resolution_log2 - 2)

2.训练过程

PGGAN的训练主要由train.py中的train_progressive_gan()实现。

# config.py
desc        = 'pgan'      
train       = EasyDict(func='train.train_progressive_gan')  # Options for main training func.
sched       = EasyDict()                                    # Options for train.TrainingSchedule.
desc += '-preset-v2-1gpu'
num_gpus = 1
sched.minibatch_base = 4
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
sched.G_lrate_dict = {1024: 0.0015}
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict)
train.total_kimg = 12000
desc += '-fp32'
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}

首先，加载数据集：

# config.py
desc += '-celebahq';            
dataset = EasyDict(tfrecord_dir='celebahq/XXX'); 
train.mirror_augment = True

training_set = dataset.load_dataset(data_dir=config.data_dir, verbose=True, **config.dataset)

然后，创建网络：

# config.py
desc        = 'pgan'      
G           = EasyDict(func='networks.G_paper')             # Options for generator network.
D           = EasyDict(func='networks.D_paper')             # Options for discriminator network.

print('Constructing networks...')
with tf.device('/gpu:0'):
     G = tfutil.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **config.G)
     D = tfutil.Network('D', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **config.D)
     Gs = G.clone('Gs')
     Gs_update_op = Gs.setup_as_moving_average_of(G, beta=G_smoothing)

然后，创建tensorflow图：
1.模型的输入：（由于是tf1版本的代码）

print('Building TensorFlow graph...')
with tf.name_scope('Inputs'):
     lod_in = tf.placeholder(tf.float32, name='lod_in', shape=[]) 
     lrate_in = tf.placeholder(tf.float32, name='lrate_in', shape=[])
     minibatch_in = tf.placeholder(tf.int32, name='minibatch_in', shape=[])
     minibatch_split = minibatch_in // config.num_gpus
     reals, labels   = training_set.get_minibatch_tf()
     reals_split     = tf.split(reals, config.num_gpus)
     labels_split    = tf.split(labels, config.num_gpus)

2.设置优化器Optimizer：

# config.py
G_opt       = EasyDict(beta1=0.0, beta2=0.99, epsilon=1e-8) # Options for generator optimizer.
D_opt       = EasyDict(beta1=0.0, beta2=0.99, epsilon=1e-8) # Options for discriminator optimizer.

G_opt = tfutil.Optimizer(name='TrainG', learning_rate=lrate_in, **config.G_opt)
D_opt = tfutil.Optimizer(name='TrainD', learning_rate=lrate_in, **config.D_opt)

这里，

with tf.name_scope('GPU%d' % gpu), tf.device('/gpu:%d' % gpu):
     G_gpu = G 
     D_gpu = D 
     lod_assign_ops = [tf.assign(G_gpu.find_var('lod'), lod_in), 
     tf.assign(D_gpu.find_var('lod'), lod_in)]
     reals_gpu = process_reals(reals_split[gpu], lod_in, mirror_augment, training_set.dynamic_range, drange_net)
     labels_gpu = labels_split[gpu]

3.设置G和D的损失函数：

# config.py
G_loss      = EasyDict(func='loss.G_wgan_acgan')            # Options for generator loss.
D_loss      = EasyDict(func='loss.D_wgangp_acgan')          # Options for discriminator loss.

with tf.name_scope('G_loss'), tf.control_dependencies(lod_assign_ops):
          G_loss = tfutil.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_split, **config.G_loss)
with tf.name_scope('D_loss'), tf.control_dependencies(lod_assign_ops):
          D_loss = tfutil.call_func_by_name(G=G_gpu, D=D_gpu, opt=D_opt, training_set=training_set, minibatch_size=minibatch_split, reals=reals_gpu, labels=labels_gpu, **config.D_loss)

4.设置反向传播的自定义梯度：利用tf.train.Optimizer.apply_gradients更新权值

G_opt.register_gradients(tf.reduce_mean(G_loss), G_gpu.trainables)
D_opt.register_gradients(tf.reduce_mean(D_loss), D_gpu.trainables)
G_train_op = G_opt.apply_updates()
D_train_op = D_opt.apply_updates()

现在，开始训练： # total_kimg是指总共需要训练的img数量，cur_nimg是指现在已经训练过的img数量，cur_tick是指，tick_start_nimg是指，
# resume_kimg是指，prev_lod是指

print('Training...')
cur_nimg = int(resume_kimg * 1000)
cur_tick = 0
tick_start_nimg = cur_nimg
tick_start_time = time.time()
train_start_time = tick_start_time - resume_time
prev_lod = -1.0

while cur_nimg < total_kimg * 1000:
        # Choose training parameters and configure training ops.
        # 选择训练参数，配置训练操作。
        sched = TrainingSchedule(cur_nimg, training_set, **config.sched)
        training_set.configure(sched.minibatch, sched.lod)
        # 通过比较prev_lod与sched.lod看是否引入了new_layer，
        # 当引入new_layer的时候，重置优化器内部状态(e.g. Adam moments)，并更新prev_lod
        if reset_opt_for_new_lod:  
            if np.floor(sched.lod) != np.floor(prev_lod) or np.ceil(sched.lod) != np.ceil(prev_lod):
                G_opt.reset_optimizer_state()
                D_opt.reset_optimizer_state()
        prev_lod = sched.lod
        # Run training ops.
        # 每更新D_repeats次（默认为1）鉴别器D，更新1次生成器G参数，并更新cur_nimg
        for repeat in range(minibatch_repeats):
            for _ in range(D_repeats):
                tfutil.run([D_train_op, Gs_update_op], {lod_in: sched.lod, lrate_in: sched.D_lrate, minibatch_in: sched.minibatch})
                cur_nimg += sched.minibatch
            tfutil.run([G_train_op], {lod_in: sched.lod, lrate_in: sched.G_lrate, minibatch_in: sched.minibatch})

tfutil.run([G_train_op], {lod_in: sched.lod, lrate_in: sched.G_lrate, minibatch_in: sched.minibatch})

tf.get_default_session().run(G_train_op(lod_in=sched.lod, lrate_in=sched.G_lrate, minibatch_in=sched.minibatch))

            # Perform maintenance tasks once per tick.
            # 每tick执行一次维护任务
            done = (cur_nimg >= total_kimg * 1000)
            if cur_nimg >= tick_start_nimg + sched.tick_kimg * 1000 or done:
               cur_tick += 1
               cur_time = time.time()
               tick_kimg = (cur_nimg - tick_start_nimg) / 1000.0
               tick_start_nimg = cur_nimg
               tick_time = cur_time - tick_start_time
               total_time = cur_time - train_start_time
               maintenance_time = tick_start_time - maintenance_start_time
               maintenance_start_time = cur_time

print('tick %-5d kimg %-8.1f lod %-5.2f minibatch %-4d time %-12s sec/tick %-7.1f sec/kimg %-7.2f maintenance %.1f' % (
                tfutil.autosummary('Progress/tick', cur_tick),
                tfutil.autosummary('Progress/kimg', cur_nimg / 1000.0),
                tfutil.autosummary('Progress/lod', sched.lod),
                tfutil.autosummary('Progress/minibatch', sched.minibatch),
                misc.format_time(tfutil.autosummary('Timing/total_sec', total_time)),
                tfutil.autosummary('Timing/sec_per_tick', tick_time),
                tfutil.autosummary('Timing/sec_per_kimg', tick_time / tick_kimg),
                tfutil.autosummary('Timing/maintenance_sec', maintenance_time)))
            tfutil.autosummary('Timing/total_hours', total_time / (60.0 * 60.0))
            tfutil.autosummary('Timing/total_days', total_time / (24.0 * 60.0 * 60.0))
            tfutil.save_summaries(summary_log, cur_nimg)

# Save snapshots.
            if cur_tick % image_snapshot_ticks == 0 or done:
                grid_fakes = Gs.run(grid_latents, grid_labels, minibatch_size=sched.minibatch//config.num_gpus)
                misc.save_image_grid(grid_fakes, os.path.join(result_subdir, 'fakes%06d.png' % (cur_nimg // 1000)), drange=drange_net, grid_size=grid_size)
            if cur_tick % network_snapshot_ticks == 0 or done:
                misc.save_pkl((G, D, Gs), os.path.join(result_subdir, 'network-snapshot-%06d.pkl' % (cur_nimg // 1000)))
            # Record start time of the next tick.
            tick_start_time = time.time()

训练完成，保存模型权重，日志文件：

misc.save_pkl((G, D, Gs), os.path.join(result_subdir, 'network-final.pkl'))
summary_log.close()
open(os.path.join(result_subdir, '_training-done.txt'), 'wt').close()

该处使用的url网络请求的数据。

在train.py的classTrainingSchedule()中，通过控制参数，在训练过程中实现渐进式增长的训练： lod_initial_resolution=4， lod_training_kimg=600和lod_transition_kimg=600表示每经过600k次迭代,增长分辨率，生成器G和鉴别器D的学习率设置初始学习率，除此之外，通过读取config文件的dict决定。

# train.py
def train_progressive_gan():
    ...
    sched = TrainingSchedule(total_kimg * 1000, training_set, **config.sched)
    sched = TrainingSchedule(cur_nimg, training_set, **config.sched)
    ...

# config.py
desc += '-preset-v2-1gpu'; 
num_gpus = 1; 
sched.minibatch_base = 4; 
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}; 
sched.G_lrate_dict = {1024: 0.0015}; 
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict); 
train.total_kimg = 12000
desc += '-fp32'; 
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}

同样，tick_kimg也是通过dict决定。

class TrainingSchedule:
   def __init__(
      self,
      cur_nimg,
      training_set,
      lod_initial_resolution  = 4,        # Image resolution used at the beginning.
      lod_training_kimg       = 600,      # Thousands of real images to show before doubling the resolution.
      lod_transition_kimg     = 600,      # Thousands of real images to show when fading in new layers.
      minibatch_base          = 16,       # Maximum minibatch size, divided evenly among GPUs.
      minibatch_dict          = {},       # Resolution-specific overrides.
      max_minibatch_per_gpu   = {},       # Resolution-specific maximum minibatch size per GPU.
      G_lrate_base            = 0.001,    # Learning rate for the generator.
      G_lrate_dict            = {},       # Resolution-specific overrides.
      D_lrate_base            = 0.001,    # Learning rate for the discriminator.
      D_lrate_dict            = {},       # Resolution-specific overrides.
      tick_kimg_base          = 160,      # Default interval of progress snapshots.
      tick_kimg_dict          = {4: 160, 8:140, 16:120, 32:100, 64:80, 128:60, 256:40, 512:20, 1024:10} # Resolution-specific overrides.
    ):

假设，现在模型训练了1500k张图像(cur_nimg=1500×1000)，kimg指现在已经训练过1500k张图像，phase_dur指现训练阶段(训练分辨率=8×8)所需要训练的图像数量（默认为600k+600k=1200k），phase_idx指现训练阶段对应的idx为1，phase_kimg指在该阶段已训练的图像数量(300k)。

       self.kimg = cur_nimg / 1000.0
       phase_dur = lod_training_kimg + lod_transition_kimg
       phase_idx = int(np.floor(self.kimg / phase_dur)) if phase_dur > 0 else 0
       phase_kimg = self.kimg - phase_idx * phase_dur

接着计算lod，其中lod=10(最终输出分辨率=1024)
首先，减去2(初始训练图像分辨率=4×4)，lod=8
其次，减去已经完成的阶段idx(phase_idx=1)，lod=7
然后，减去该阶段转换阶段完成的百分比；(1)cur_nimg=1500k，训练还未进入转换阶段,lod=7；(2)如果此时的cur_nimg=2100k，可以得知，8×8的训练已经完成，正在向16×16阶段进行转换，lod减去0.5((900k-600k)/600k=0.5)，lod=6.5。
最终，计算本次训练的图像分辨率：(1)仍处于8×8的训练的训练阶段；(2)已经处于16×16的训练阶段。

       self.lod = training_set.resolution_log2
       self.lod -= np.floor(np.log2(lod_initial_resolution))
       self.lod -= phase_idx
       if lod_transition_kimg > 0:
          self.lod -= max(phase_kimg - lod_training_kimg, 0.0) / lod_transition_kimg
       self.lod = max(self.lod, 0.0)
       self.resolution = 2 ** (training_set.resolution_log2 - int(np.floor(self.lod)))

通过当前resolution决定minibatch大小，以及生成器G和鉴别器D的学习率大小，其中，minibatch的大小还与config.py文件里的设置有关。
假设现在resolution=128，gpu=2，minibatch = 16（此处要保证每个GPU分配的batch大小相同），
假设现在resolution=256，gpu=4，minibatch = 8，（此处要保证每个GPU分配的batch大小相同），
生成器G和鉴别器D的初始学习率为0.001，1024×1024的学习率为0.0015。

# config.py
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}
sched.G_lrate_dict = {1024: 0.0015}; 
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict);

       # Minibatch size.
       self.minibatch = minibatch_dict.get(self.resolution, minibatch_base)
       self.minibatch -= self.minibatch % config.num_gpus
       if self.resolution in max_minibatch_per_gpu:
          self.minibatch = min(self.minibatch, max_minibatch_per_gpu[self.resolution] * config.num_gpus)
       self.G_lrate = G_lrate_dict.get(self.resolution, G_lrate_base)
       self.D_lrate = D_lrate_dict.get(self.resolution, D_lrate_base)  
       self.tick_kimg = tick_kimg_dict.get(self.resolution, tick_kimg_base)

总结

提示：这里对文章进行总结：
例如：以上就是今天要讲的内容，本文仅仅简单介绍了pandas的使用，而pandas提供了大量能使我们快速便捷地处理数据的函数和方法。

标签：GANS,sched,PROGRESSIVE,lod,res,minibatch,IMPROVED,tick,kimg
来源： https://blog.csdn.net/weixin_43066973/article/details/120358682

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9