使用Keras实现基于注意力机制（Attention）的 LSTM 时间序列预测

2022-10-30 13:39:48 阅读：157 来源： 互联网

对于时间步的注意力机制

首先我们把它git clone 到本地，然后配置好所需环境笔者的 tensorflow版本为1.6.0 Keras 版本为 2.0.2 打开文件夹，我们主要需要的是attention_lstm.py 以及 attention_utils.py 脚本

项目中生成数据的函数为

def get_data_recurrent(n, time_steps, input_dim, attention_column=10):
    """
    Data generation. x is purely random except that its first value equals the target y.
    In practice, the network    should learn that the target = x[attention_column].
    Therefore, most of its attention should be focused on the value addressed by attention_column.
    :param n: the number of samples to retrieve.
    :param time_steps: the number of time steps of your series.
    :param input_dim: the number of dimensions of each element in the series.
    :param attention_column: the column linked to the target. Everything else is purely random.
    :return: x: model inputs, y: model targets
    """
    x = np.random.standard_normal(size=(n, time_steps, input_dim))
    y = np.random.randint(low=0, high=2, size=(n, 1))
    x[:, attention_column, :] = np.tile(y[:], (1, input_dim))
    return x, y

默认的 n = 30000, input_dim = 2 ,timesteps = 20。生成的数据为：

shape x 30000 x 20 x 2 y 30000 x 1

其中 x 的第11个 timestep 两维的数据与y相同，其他timestep 维的数据为随机数。

直接运行 attention_lstm.py 脚本此时的网络结构为：可以看到是在 LSTM 层之后使用了注意力机制

最后会汇总画一张图可以看到可以看到注意力的权重主要汇总在了第11个timestep，说明注意力机制很成功

对于维的注意力机制

上述的例子是将注意力机制使用在了 timestep 上，决定哪个时间步对于结果的影响较大。而如果我们想将注意力机制使用在维上呢？比如使用多维去预测一维的数据，我们想使用注意力机制决定哪些维对于预测维起关键作用。

比较简单的方法就是将输入数据 reshape 一下将timesteps 与 input_dim 维对换再运行就可以了，因为本代码的设置就是对输入的第2维加入注意力机制.

进阶的方法就是自写一下 attention_3d_block 函数:

def attention_3d_block(inputs):
    # inputs.shape = (batch_size, time_steps, input_dim)
    input_dim = int(inputs.shape[2])
    a = inputs
    #a = Permute((2, 1))(inputs)
    #a = Reshape((input_dim, TIME_STEPS))(a) # this line is not useful. Its just to know which dimension is what.
    a = Dense(input_dim, activation=softmax)(a)
    if SINGLE_ATTENTION_VECTOR:
        a = Lambda(lambda x: K.mean(x, axis=1), name=dim_reduction)(a)
        a = RepeatVector(input_dim)(a)
    a_probs = Permute((1, 2), name=attention_vec)(a)
    #a_probs = a
    output_attention_mul = merge([inputs, a_probs], name=attention_mul, mode=mul)
    return output_attention_mul

接下来再在attention_utils.py 脚本中写一个产生数据集的新函数:

def get_data_recurrent2(n, time_steps, input_dim, attention_dim=5):
    """
    假设 input_dim = 10  time_steps = 6
    产生一个  x 6 x 10 的数据 其中每步的第 6 维 与 y相同

    """
    x = np.random.standard_normal(size=(n, time_steps, input_dim))
    y = np.random.randint(low=0, high=2, size=(n, 1))
    x[:,:,attention_dim] =  np.tile(y[:], (1, time_steps))

    return x,y

试着产生一组数据 get_data_recurrent2(1,6,10)

然后我们稍微改动一下main函数进行新的训练。迭代十次后结果为: 可以看到,第6维的权重比较大。如果我们对于timesteps的注意力画一个汇总图，即改动一下

attention_vector = np.mean(get_activations(m, testing_inputs_1,print_shape_only=False,layer_name=attention_vec)[0], axis=2).squeeze()

可以看到对于timesteps的注意力是相同的（其实如果对于开头时间步的注意力机制，对输入维的注意力画一个汇总图，也是相同的）

标签：keras,基于,attention,LSTM,序列
来源：

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

使用Keras实现 基于注意力机制（Attention）的 LSTM 时间序列预测

对于时间步的注意力机制

对于维的注意力机制

使用Keras实现基于注意力机制（Attention）的 LSTM 时间序列预测