ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

[PARL强化学习]Sarsa和Q—learning的实现

2021-03-21 13:05:49  阅读:247  来源: 互联网

标签:Episode 0.0 PARL Sarsa steps learning 1.0 reward obs


[PARL强化学习]Sarsa和Q—learning的实现

Sarsa和Q—learning都是利用表格法再根据MDP四元组<S,A,P,R>:S: state状态,a: action动作,r:reward,奖励p: probability状态转移概率实现强化学习的方法。

这两种方法都是根据环境来进行学习,因此我们需要利用P函数和R函数描述环境、

而Q表格用于记录每一个状态(state)上进行的每一个动作(action)计算出最大的未来奖励(reward)的期望。

训练完成的Q表格将用于指导智能体的行动。

一、Sarsa简介

Sarsa全称是state-action-reward-state'-action',目的是学习特定的state下,特定action的价值Q,最终建立和优化一个Q表格,以state为行,action为列,根据与环境交互得到的reward来更新Q表格.

”SARSA“ 五个字母是当前 S (状态), A(行动), R(奖励) 与 下一步S’(状态) A’(行动) 的组合,即我们不仅需要知道当前的S, A, R 还需要知道下一步的 S’ 和 A‘。

在Sarsa算法中,智能体的目标

R(S1) + γ*Q(S1,A)

至于A是多少,完全取决于智能体实际上选择的哪一个Action。智能体有90%的概率会选择Q值最大的Action(A2),还有10%的概率会随机选择一个Action。

因此,Sarsa的算法是这样的,也即是Q表格的更新公式

img

Sarsa在训练中为了更好的探索环境,采用ε-greedy方式来训练,有一定概率随机选择动作输出。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5482JiES-1616143534658)(http://yanxuan.nosdn.127.net/3ed475f315dfa73299222f848dcc1de7.png)]

二、Sarsa的实现

导入库

import gym
import numpy as np
import time

Sarsa方法实现

class SarsaAgent(object):
    def __init__(self, obs_n, act_n, learning_rate=0.01, gamma=0.9, e_greed=0.1):
        self.act_n = act_n      # 动作维度,有几个动作可选
        self.lr = learning_rate # 学习率
        self.gamma = gamma      # reward的衰减率
        self.epsilon = e_greed  # 按一定概率随机选动作
        self.Q = np.zeros((obs_n, act_n))

    # 根据输入观察值,采样输出的动作值,带探索
    def sample(self, obs):
       
        if np.random.uniform(0, 1) < (1.0 - self.epsilon): #根据table的Q值选动作
            action = self.predict(obs)
        else:
            action = np.random.choice(self.act_n) #有一定概率随机探索选取一个动作
    
        return action

    # 根据输入观察值,预测输出的动作值
    def predict(self, obs):
       
        Q_list = self.Q[obs, :]
        maxQ = np.max(Q_list)
        action_list = np.where(Q_list == maxQ)[0]  # maxQ可能对应多个action
        action = np.random.choice(action_list)

        return action

    # 学习方法,也就是更新Q-table的方法
    def learn(self, obs, action, reward, next_obs, next_action, done):
        """ on-policy
            obs: 交互前的obs, s_t
            action: 本次交互选择的action, a_t
            reward: 本次动作获得的奖励r
            next_obs: 本次交互后的obs, s_t+1
            next_action: 根据当前Q表格, 针对next_obs会选择的动作, a_t+1
            done: episode是否结束
        """
        predict_Q = self.Q[obs, action]
        if done:
            target_Q = reward # 没有下一个状态了
        else:
            target_Q = reward + self.gamma * self.Q[next_obs, next_action] # Sarsa
        self.Q[obs, action] += self.lr * (target_Q - predict_Q) # 修正q

    # 保存Q表格数据到文件
    def save(self):
        npy_file = './q_table.npy'
        np.save(npy_file, self.Q)
        print(npy_file + ' saved.')
    
    # 从文件中读取数据到Q表格中
    def restore(self, npy_file='./q_table.npy'):
        self.Q = np.load(npy_file)
        print(npy_file + ' loaded.')

训练部分

def run_episode(env, agent, render=False):
    total_steps = 0 # 记录每个episode走了多少step
    total_reward = 0

    obs = env.reset() # 重置环境, 重新开一局(即开始新的一个episode)
    action = agent.sample(obs) # 根据算法选择一个动作

    while True:
        next_obs, reward, done, _ = env.step(action) # 与环境进行一个交互
        next_action = agent.sample(next_obs) # 根据算法选择一个动作
        # 训练 Sarsa 算法
        agent.learn(obs, action, reward, next_obs, next_action, done)

        action = next_action
        obs = next_obs  # 存储上一个观察值
        total_reward += reward
        total_steps += 1 # 计算step数
        if render:
            env.render() #渲染新的一帧图形
        if done:
            break
    return total_reward, total_steps

测试程序

def test_episode(env, agent):
    total_reward = 0
    obs = env.reset()
    while True:
        action = agent.predict(obs) # greedy
        next_obs, reward, done, _ = env.step(action)
        total_reward += reward
        obs = next_obs
        time.sleep(0.5)
        env.render()
        if done:
            break
    return total_reward

主程序

# 使用gym创建迷宫环境,设置is_slippery为False降低环境难度
env = gym.make("FrozenLake-v0", is_slippery=False)  # 0 left, 1 down, 2 right, 3 up

# 创建一个agent实例,输入超参数
agent = SarsaAgent(
        obs_n=env.observation_space.n,
        act_n=env.action_space.n,
        learning_rate=0.1,
        gamma=0.9,
        e_greed=0.1)


# 训练500个episode,打印每个episode的分数
for episode in range(500):
    ep_reward, ep_steps = run_episode(env, agent, False)
    print('Episode %s: steps = %s , reward = %.1f' % (episode, ep_steps, ep_reward))

# 全部训练结束,查看算法效果
test_reward = test_episode(env, agent)
print('test reward = %.1f' % (test_reward))

运行结果

三、Q-learning简介

  • Q-learning也是采用Q表格的方式存储Q值(状态动作价值),决策部分与Sarsa是一样的,采用ε-greedy方式增加探索。
  • Q-learning跟Sarsa不一样的地方是更新Q表格的方式。
    • Sarsaon-policy的更新方式,先做出动作再更新。
    • Q-learningoff-policy的更新方式,更新learn()时无需获取下一步实际做出的动作next_action,并假设下一步动作是取最大Q值的动作。
  • Q-learning的更新公式为:

img

因此在学习过程中也只有更新公式有略微差别,其他方式都是一样的

四、Q-learning的实现

导入库

import gym
import numpy as np
import time

Q——learning方法实现

class QLearningAgent(object):
    def __init__(self, obs_n, act_n, learning_rate=0.01, gamma=0.9, e_greed=0.1):
        self.act_n = act_n      # 动作维度,有几个动作可选
        self.lr = learning_rate # 学习率
        self.gamma = gamma      # reward的衰减率
        self.epsilon = e_greed  # 按一定概率随机选动作
        self.Q = np.zeros((obs_n, act_n))

    # 根据输入观察值,采样输出的动作值,带探索
    def sample(self, obs):

        if np.random.uniform(0, 1) < (1.0 - self.epsilon): #根据table的Q值选动作
            action = self.predict(obs)
        else:
            action = np.random.choice(self.act_n) #有一定概率随机探索选取一个动作
        return action

    # 根据输入观察值,预测输出的动作值
    def predict(self, obs):
       
        Q_list = self.Q[obs, :]
        maxQ = np.max(Q_list)
        action_list = np.where(Q_list == maxQ)[0]  # maxQ可能对应多个action
        action = np.random.choice(action_list)
        return action

    # 学习方法,也就是更新Q-table的方法
    def learn(self, obs, action, reward, next_obs, done):
        """ off-policy
            obs: 交互前的obs, s_t
            action: 本次交互选择的action, a_t
            reward: 本次动作获得的奖励r
            next_obs: 本次交互后的obs, s_t+1
            done: episode是否结束
        """

        predict_Q = self.Q[obs, action]
        if done:
            target_Q = reward # 没有下一个状态了
        else:
            target_Q = reward + self.gamma * np.max(self.Q[next_obs, :]) # Q-learning
        self.Q[obs, action] += self.lr * (target_Q - predict_Q) # 修正q

    # 保存Q表格数据到文件
    def save(self):
        npy_file = './q_table.npy'
        np.save(npy_file, self.Q)
        print(npy_file + ' saved.')
    
    # 从文件中读取数据到Q表格中
    def restore(self, npy_file='./q_table.npy'):
        self.Q = np.load(npy_file)
        print(npy_file + ' loaded.')

训练部分

def run_episode(env, agent, render=False):
    total_steps = 0 # 记录每个episode走了多少step
    total_reward = 0

    obs = env.reset() # 重置环境, 重新开一局(即开始新的一个episode)

    while True:
        action = agent.sample(obs) # 根据算法选择一个动作
        next_obs, reward, done, _ = env.step(action) # 与环境进行一个交互
        # 训练 Q-learning算法
        agent.learn(obs, action, reward, next_obs, done)

        obs = next_obs  # 存储上一个观察值
        total_reward += reward
        total_steps += 1 # 计算step数
        if render:
            env.render() #渲染新的一帧图形
        if done:
            break
    return total_reward, total_steps

测试程序

def test_episode(env, agent):
    total_reward = 0
    obs = env.reset()
    while True:
        action = agent.predict(obs) # greedy
        next_obs, reward, done, _ = env.step(action)
        total_reward += reward
        obs = next_obs
        # time.sleep(0.5)
        # env.render()
        if done:
            break
    return total_reward

主程序

# 使用gym创建悬崖环境
env = gym.make("CliffWalking-v0")  # 0 up, 1 right, 2 down, 3 left

# 创建一个agent实例,输入超参数
agent = QLearningAgent(
    obs_n=env.observation_space.n,
    act_n=env.action_space.n,
    learning_rate=0.1,
    gamma=0.9,
    e_greed=0.1)


# 训练500个episode,打印每个episode的分数
for episode in range(500):
    ep_reward, ep_steps = run_episode(env, agent, False)
    print('Episode %s: steps = %s , reward = %.1f' % (episode, ep_steps, ep_reward))

# 全部训练结束,查看算法效果
test_reward = test_episode(env, agent)
print('test reward = %.1f' % (test_reward))

运行结果

Episode 0: steps = 14 , reward = 0.0
Episode 1: steps = 8 , reward = 0.0
Episode 2: steps = 27 , reward = 0.0
Episode 3: steps = 7 , reward = 0.0
Episode 4: steps = 9 , reward = 0.0
Episode 5: steps = 11 , reward = 0.0
Episode 6: steps = 7 , reward = 0.0
Episode 7: steps = 6 , reward = 0.0
Episode 8: steps = 7 , reward = 0.0
Episode 9: steps = 6 , reward = 0.0
Episode 10: steps = 11 , reward = 0.0
Episode 11: steps = 2 , reward = 0.0
Episode 12: steps = 16 , reward = 0.0
Episode 13: steps = 7 , reward = 1.0
Episode 14: steps = 13 , reward = 0.0
Episode 15: steps = 6 , reward = 0.0
Episode 16: steps = 12 , reward = 0.0
Episode 17: steps = 4 , reward = 0.0
Episode 18: steps = 21 , reward = 0.0
Episode 19: steps = 15 , reward = 0.0
Episode 20: steps = 2 , reward = 0.0
Episode 21: steps = 16 , reward = 0.0
Episode 22: steps = 4 , reward = 0.0
Episode 23: steps = 10 , reward = 0.0
Episode 24: steps = 11 , reward = 1.0
Episode 25: steps = 10 , reward = 0.0
Episode 26: steps = 6 , reward = 1.0
Episode 27: steps = 17 , reward = 0.0
Episode 28: steps = 5 , reward = 0.0
Episode 29: steps = 6 , reward = 0.0
Episode 30: steps = 31 , reward = 0.0
Episode 31: steps = 8 , reward = 0.0
Episode 32: steps = 9 , reward = 0.0
Episode 33: steps = 4 , reward = 0.0
Episode 34: steps = 16 , reward = 1.0
Episode 35: steps = 6 , reward = 0.0
Episode 36: steps = 11 , reward = 0.0
Episode 37: steps = 8 , reward = 0.0
Episode 38: steps = 12 , reward = 0.0
Episode 39: steps = 6 , reward = 1.0
Episode 40: steps = 6 , reward = 0.0
Episode 41: steps = 10 , reward = 0.0
Episode 42: steps = 6 , reward = 0.0
Episode 43: steps = 3 , reward = 0.0
Episode 44: steps = 9 , reward = 0.0
Episode 45: steps = 11 , reward = 1.0
Episode 46: steps = 7 , reward = 1.0
Episode 47: steps = 8 , reward = 1.0
Episode 48: steps = 8 , reward = 1.0
Episode 49: steps = 7 , reward = 1.0
Episode 50: steps = 6 , reward = 1.0
Episode 51: steps = 6 , reward = 1.0
Episode 52: steps = 6 , reward = 1.0
Episode 53: steps = 6 , reward = 1.0
Episode 54: steps = 6 , reward = 1.0
Episode 55: steps = 6 , reward = 1.0
Episode 56: steps = 4 , reward = 0.0
Episode 57: steps = 5 , reward = 0.0
Episode 58: steps = 6 , reward = 1.0
Episode 59: steps = 5 , reward = 0.0
Episode 60: steps = 6 , reward = 1.0
Episode 61: steps = 6 , reward = 1.0
Episode 62: steps = 6 , reward = 1.0
Episode 63: steps = 6 , reward = 1.0
Episode 64: steps = 9 , reward = 1.0
Episode 65: steps = 6 , reward = 1.0
Episode 66: steps = 7 , reward = 1.0
Episode 67: steps = 6 , reward = 1.0
Episode 68: steps = 10 , reward = 1.0
Episode 69: steps = 7 , reward = 1.0
Episode 70: steps = 8 , reward = 1.0
Episode 71: steps = 5 , reward = 0.0
Episode 72: steps = 6 , reward = 1.0
Episode 73: steps = 6 , reward = 1.0
Episode 74: steps = 6 , reward = 1.0
Episode 75: steps = 6 , reward = 1.0
Episode 76: steps = 2 , reward = 0.0
Episode 77: steps = 6 , reward = 1.0
Episode 78: steps = 6 , reward = 1.0
Episode 79: steps = 6 , reward = 1.0
Episode 80: steps = 6 , reward = 1.0
Episode 81: steps = 6 , reward = 1.0
Episode 82: steps = 8 , reward = 1.0
Episode 83: steps = 8 , reward = 1.0
Episode 84: steps = 6 , reward = 1.0
Episode 85: steps = 6 , reward = 1.0
Episode 86: steps = 6 , reward = 1.0
Episode 87: steps = 5 , reward = 0.0
Episode 88: steps = 7 , reward = 1.0
Episode 89: steps = 6 , reward = 1.0
Episode 90: steps = 6 , reward = 1.0
Episode 91: steps = 7 , reward = 1.0
Episode 92: steps = 6 , reward = 1.0
Episode 93: steps = 6 , reward = 1.0
Episode 94: steps = 6 , reward = 1.0
Episode 95: steps = 6 , reward = 1.0
Episode 96: steps = 6 , reward = 1.0
Episode 97: steps = 7 , reward = 1.0
Episode 98: steps = 3 , reward = 0.0
Episode 99: steps = 6 , reward = 1.0
Episode 100: steps = 6 , reward = 1.0
Episode 101: steps = 6 , reward = 1.0
Episode 102: steps = 6 , reward = 1.0
Episode 103: steps = 7 , reward = 1.0
Episode 104: steps = 6 , reward = 1.0
Episode 105: steps = 8 , reward = 1.0
Episode 106: steps = 6 , reward = 1.0
Episode 107: steps = 6 , reward = 1.0
Episode 108: steps = 6 , reward = 1.0
Episode 109: steps = 6 , reward = 1.0
Episode 110: steps = 6 , reward = 1.0
Episode 111: steps = 6 , reward = 1.0
Episode 112: steps = 8 , reward = 0.0
Episode 113: steps = 8 , reward = 1.0
Episode 114: steps = 4 , reward = 0.0
Episode 115: steps = 6 , reward = 1.0
Episode 116: steps = 5 , reward = 0.0
Episode 117: steps = 6 , reward = 1.0
Episode 118: steps = 6 , reward = 1.0
Episode 119: steps = 6 , reward = 1.0
Episode 120: steps = 7 , reward = 1.0
Episode 121: steps = 6 , reward = 1.0
Episode 122: steps = 4 , reward = 0.0
Episode 123: steps = 3 , reward = 0.0
Episode 124: steps = 6 , reward = 1.0
Episode 125: steps = 6 , reward = 1.0
Episode 126: steps = 6 , reward = 1.0
Episode 127: steps = 6 , reward = 1.0
Episode 128: steps = 7 , reward = 1.0
Episode 129: steps = 8 , reward = 1.0
Episode 130: steps = 6 , reward = 1.0
Episode 131: steps = 6 , reward = 1.0
Episode 132: steps = 6 , reward = 1.0
Episode 133: steps = 6 , reward = 1.0
Episode 134: steps = 6 , reward = 1.0
Episode 135: steps = 3 , reward = 0.0
Episode 136: steps = 6 , reward = 1.0
Episode 137: steps = 6 , reward = 0.0
Episode 138: steps = 13 , reward = 1.0
Episode 139: steps = 6 , reward = 1.0
Episode 140: steps = 12 , reward = 1.0
Episode 141: steps = 6 , reward = 1.0
Episode 142: steps = 6 , reward = 1.0
Episode 143: steps = 6 , reward = 1.0
Episode 144: steps = 6 , reward = 1.0
Episode 145: steps = 6 , reward = 1.0
Episode 146: steps = 5 , reward = 0.0
Episode 147: steps = 9 , reward = 1.0
Episode 148: steps = 6 , reward = 1.0
Episode 149: steps = 8 , reward = 1.0
Episode 150: steps = 6 , reward = 1.0
Episode 151: steps = 10 , reward = 1.0
Episode 152: steps = 6 , reward = 1.0
Episode 153: steps = 6 , reward = 1.0
Episode 154: steps = 6 , reward = 1.0
Episode 155: steps = 4 , reward = 0.0
Episode 156: steps = 6 , reward = 1.0
Episode 157: steps = 5 , reward = 0.0
Episode 158: steps = 6 , reward = 1.0
Episode 159: steps = 6 , reward = 1.0
Episode 160: steps = 8 , reward = 1.0
Episode 161: steps = 6 , reward = 1.0
Episode 162: steps = 6 , reward = 1.0
Episode 163: steps = 6 , reward = 1.0
Episode 164: steps = 6 , reward = 1.0
Episode 165: steps = 4 , reward = 0.0
Episode 166: steps = 5 , reward = 0.0
Episode 167: steps = 6 , reward = 1.0
Episode 168: steps = 3 , reward = 0.0
Episode 169: steps = 6 , reward = 1.0
Episode 170: steps = 3 , reward = 0.0
Episode 171: steps = 6 , reward = 1.0
Episode 172: steps = 5 , reward = 0.0
Episode 173: steps = 6 , reward = 1.0
Episode 174: steps = 7 , reward = 1.0
Episode 175: steps = 6 , reward = 1.0
Episode 176: steps = 6 , reward = 1.0
Episode 177: steps = 8 , reward = 1.0
Episode 178: steps = 6 , reward = 1.0
Episode 179: steps = 7 , reward = 1.0
Episode 180: steps = 6 , reward = 1.0
Episode 181: steps = 7 , reward = 1.0
Episode 182: steps = 6 , reward = 1.0
Episode 183: steps = 6 , reward = 1.0
Episode 184: steps = 6 , reward = 1.0
Episode 185: steps = 6 , reward = 1.0
Episode 186: steps = 8 , reward = 1.0
Episode 187: steps = 7 , reward = 1.0
Episode 188: steps = 6 , reward = 1.0
Episode 189: steps = 7 , reward = 1.0
Episode 190: steps = 6 , reward = 1.0
Episode 191: steps = 8 , reward = 1.0
Episode 192: steps = 6 , reward = 1.0
Episode 193: steps = 6 , reward = 1.0
Episode 194: steps = 6 , reward = 1.0
Episode 195: steps = 6 , reward = 1.0
Episode 196: steps = 6 , reward = 1.0
Episode 197: steps = 6 , reward = 1.0
Episode 198: steps = 6 , reward = 1.0
Episode 199: steps = 6 , reward = 1.0
Episode 200: steps = 6 , reward = 1.0
Episode 201: steps = 6 , reward = 1.0
Episode 202: steps = 8 , reward = 1.0
Episode 203: steps = 8 , reward = 1.0
Episode 204: steps = 6 , reward = 1.0
Episode 205: steps = 7 , reward = 1.0
Episode 206: steps = 6 , reward = 1.0
Episode 207: steps = 4 , reward = 0.0
Episode 208: steps = 6 , reward = 1.0
Episode 209: steps = 2 , reward = 0.0
Episode 210: steps = 6 , reward = 1.0
Episode 211: steps = 6 , reward = 1.0
Episode 212: steps = 8 , reward = 1.0
Episode 213: steps = 6 , reward = 1.0
Episode 214: steps = 6 , reward = 1.0
Episode 215: steps = 7 , reward = 1.0
Episode 216: steps = 7 , reward = 1.0
Episode 217: steps = 6 , reward = 1.0
Episode 218: steps = 2 , reward = 0.0
Episode 219: steps = 3 , reward = 0.0
Episode 220: steps = 6 , reward = 1.0
Episode 221: steps = 6 , reward = 1.0
Episode 222: steps = 6 , reward = 1.0
Episode 223: steps = 3 , reward = 0.0
Episode 224: steps = 7 , reward = 0.0
Episode 225: steps = 4 , reward = 0.0
Episode 226: steps = 5 , reward = 0.0
Episode 227: steps = 6 , reward = 1.0
Episode 228: steps = 6 , reward = 1.0
Episode 229: steps = 6 , reward = 1.0
Episode 230: steps = 7 , reward = 1.0
Episode 231: steps = 8 , reward = 1.0
Episode 232: steps = 9 , reward = 1.0
Episode 233: steps = 10 , reward = 1.0
Episode 234: steps = 9 , reward = 1.0
Episode 235: steps = 7 , reward = 1.0
Episode 236: steps = 8 , reward = 1.0
Episode 237: steps = 8 , reward = 1.0
Episode 238: steps = 8 , reward = 1.0
Episode 239: steps = 6 , reward = 1.0
Episode 240: steps = 6 , reward = 1.0
Episode 241: steps = 9 , reward = 1.0
Episode 242: steps = 6 , reward = 1.0
Episode 243: steps = 6 , reward = 1.0
Episode 244: steps = 6 , reward = 1.0
Episode 245: steps = 7 , reward = 1.0
Episode 246: steps = 8 , reward = 1.0
Episode 247: steps = 7 , reward = 1.0
Episode 248: steps = 12 , reward = 1.0
Episode 249: steps = 6 , reward = 1.0
Episode 250: steps = 6 , reward = 1.0
Episode 251: steps = 6 , reward = 1.0
Episode 252: steps = 6 , reward = 1.0
Episode 253: steps = 6 , reward = 1.0
Episode 254: steps = 6 , reward = 1.0
Episode 255: steps = 7 , reward = 1.0
Episode 256: steps = 8 , reward = 1.0
Episode 257: steps = 12 , reward = 1.0
Episode 258: steps = 6 , reward = 1.0
Episode 259: steps = 8 , reward = 1.0
Episode 260: steps = 6 , reward = 1.0
Episode 261: steps = 6 , reward = 1.0
Episode 262: steps = 6 , reward = 1.0
Episode 263: steps = 6 , reward = 1.0
Episode 264: steps = 4 , reward = 0.0
Episode 265: steps = 4 , reward = 0.0
Episode 266: steps = 4 , reward = 0.0
Episode 267: steps = 7 , reward = 1.0
Episode 268: steps = 6 , reward = 1.0
Episode 269: steps = 6 , reward = 1.0
Episode 270: steps = 6 , reward = 1.0
Episode 271: steps = 7 , reward = 1.0
Episode 272: steps = 6 , reward = 1.0
Episode 273: steps = 6 , reward = 1.0
Episode 274: steps = 7 , reward = 1.0
Episode 275: steps = 6 , reward = 1.0
Episode 276: steps = 6 , reward = 1.0
Episode 277: steps = 6 , reward = 1.0
Episode 278: steps = 2 , reward = 0.0
Episode 279: steps = 6 , reward = 1.0
Episode 280: steps = 6 , reward = 1.0
Episode 281: steps = 6 , reward = 1.0
Episode 282: steps = 6 , reward = 1.0
Episode 283: steps = 4 , reward = 0.0
Episode 284: steps = 8 , reward = 1.0
Episode 285: steps = 6 , reward = 1.0
Episode 286: steps = 6 , reward = 1.0
Episode 287: steps = 7 , reward = 1.0
Episode 288: steps = 6 , reward = 1.0
Episode 289: steps = 6 , reward = 1.0
Episode 290: steps = 6 , reward = 1.0
Episode 291: steps = 8 , reward = 1.0
Episode 292: steps = 6 , reward = 1.0
Episode 293: steps = 4 , reward = 0.0
Episode 294: steps = 6 , reward = 1.0
Episode 295: steps = 6 , reward = 1.0
Episode 296: steps = 6 , reward = 1.0
Episode 297: steps = 6 , reward = 1.0
Episode 298: steps = 8 , reward = 1.0
Episode 299: steps = 6 , reward = 1.0
Episode 300: steps = 6 , reward = 1.0
Episode 301: steps = 6 , reward = 1.0
Episode 302: steps = 6 , reward = 1.0
Episode 303: steps = 6 , reward = 1.0
Episode 304: steps = 6 , reward = 1.0
Episode 305: steps = 6 , reward = 1.0
Episode 306: steps = 6 , reward = 1.0
Episode 307: steps = 6 , reward = 1.0
Episode 308: steps = 9 , reward = 1.0
Episode 309: steps = 6 , reward = 1.0
Episode 310: steps = 6 , reward = 0.0
Episode 311: steps = 6 , reward = 1.0
Episode 312: steps = 5 , reward = 0.0
Episode 313: steps = 6 , reward = 1.0
Episode 314: steps = 6 , reward = 1.0
Episode 315: steps = 6 , reward = 1.0
Episode 316: steps = 7 , reward = 1.0
Episode 317: steps = 6 , reward = 1.0
Episode 318: steps = 6 , reward = 1.0
Episode 319: steps = 6 , reward = 1.0
Episode 320: steps = 6 , reward = 1.0
Episode 321: steps = 6 , reward = 1.0
Episode 322: steps = 10 , reward = 1.0
Episode 323: steps = 6 , reward = 1.0
Episode 324: steps = 8 , reward = 1.0
Episode 325: steps = 3 , reward = 0.0
Episode 326: steps = 6 , reward = 1.0
Episode 327: steps = 6 , reward = 1.0
Episode 328: steps = 6 , reward = 1.0
Episode 329: steps = 6 , reward = 1.0
Episode 330: steps = 6 , reward = 1.0
Episode 331: steps = 6 , reward = 1.0
Episode 332: steps = 8 , reward = 1.0
Episode 333: steps = 7 , reward = 1.0
Episode 334: steps = 7 , reward = 1.0
Episode 335: steps = 6 , reward = 1.0
Episode 336: steps = 6 , reward = 1.0
Episode 337: steps = 6 , reward = 1.0
Episode 338: steps = 2 , reward = 0.0
Episode 339: steps = 6 , reward = 1.0
Episode 340: steps = 6 , reward = 1.0
Episode 341: steps = 6 , reward = 1.0
Episode 342: steps = 6 , reward = 1.0
Episode 343: steps = 8 , reward = 1.0
Episode 344: steps = 6 , reward = 1.0
Episode 345: steps = 6 , reward = 1.0
Episode 346: steps = 6 , reward = 1.0
Episode 347: steps = 6 , reward = 1.0
Episode 348: steps = 6 , reward = 1.0
Episode 349: steps = 7 , reward = 1.0
Episode 350: steps = 6 , reward = 1.0
Episode 351: steps = 6 , reward = 1.0
Episode 352: steps = 6 , reward = 1.0
Episode 353: steps = 6 , reward = 1.0
Episode 354: steps = 6 , reward = 1.0
Episode 355: steps = 8 , reward = 1.0
Episode 356: steps = 7 , reward = 1.0
Episode 357: steps = 6 , reward = 1.0
Episode 358: steps = 6 , reward = 0.0
Episode 359: steps = 6 , reward = 1.0
Episode 360: steps = 6 , reward = 1.0
Episode 361: steps = 6 , reward = 1.0
Episode 362: steps = 6 , reward = 1.0
Episode 363: steps = 6 , reward = 1.0
Episode 364: steps = 6 , reward = 1.0
Episode 365: steps = 6 , reward = 1.0
Episode 366: steps = 7 , reward = 1.0
Episode 367: steps = 6 , reward = 1.0
Episode 368: steps = 6 , reward = 1.0
Episode 369: steps = 6 , reward = 1.0
Episode 370: steps = 7 , reward = 1.0
Episode 371: steps = 8 , reward = 1.0
Episode 372: steps = 6 , reward = 1.0
Episode 373: steps = 7 , reward = 1.0
Episode 374: steps = 6 , reward = 1.0
Episode 375: steps = 8 , reward = 1.0
Episode 376: steps = 6 , reward = 1.0
Episode 377: steps = 6 , reward = 1.0
Episode 378: steps = 6 , reward = 1.0
Episode 379: steps = 6 , reward = 1.0
Episode 380: steps = 6 , reward = 1.0
Episode 381: steps = 6 , reward = 1.0
Episode 382: steps = 6 , reward = 1.0
Episode 383: steps = 6 , reward = 1.0
Episode 384: steps = 6 , reward = 1.0
Episode 385: steps = 6 , reward = 1.0
Episode 386: steps = 6 , reward = 1.0
Episode 387: steps = 8 , reward = 1.0
Episode 388: steps = 6 , reward = 1.0
Episode 389: steps = 6 , reward = 1.0
Episode 390: steps = 6 , reward = 1.0
Episode 391: steps = 9 , reward = 1.0
Episode 392: steps = 8 , reward = 1.0
Episode 393: steps = 6 , reward = 1.0
Episode 394: steps = 4 , reward = 0.0
Episode 395: steps = 6 , reward = 1.0
Episode 396: steps = 7 , reward = 1.0
Episode 397: steps = 6 , reward = 1.0
Episode 398: steps = 6 , reward = 1.0
Episode 399: steps = 9 , reward = 1.0
Episode 400: steps = 6 , reward = 1.0
Episode 401: steps = 6 , reward = 1.0
Episode 402: steps = 3 , reward = 0.0
Episode 403: steps = 6 , reward = 1.0
Episode 404: steps = 9 , reward = 1.0
Episode 405: steps = 7 , reward = 1.0
Episode 406: steps = 6 , reward = 1.0
Episode 407: steps = 6 , reward = 1.0
Episode 408: steps = 6 , reward = 1.0
Episode 409: steps = 9 , reward = 1.0
Episode 410: steps = 6 , reward = 1.0
Episode 411: steps = 6 , reward = 1.0
Episode 412: steps = 6 , reward = 1.0
Episode 413: steps = 6 , reward = 1.0
Episode 414: steps = 6 , reward = 1.0
Episode 415: steps = 6 , reward = 1.0
Episode 416: steps = 6 , reward = 1.0
Episode 417: steps = 8 , reward = 1.0
Episode 418: steps = 4 , reward = 0.0
Episode 419: steps = 8 , reward = 1.0
Episode 420: steps = 6 , reward = 1.0
Episode 421: steps = 6 , reward = 1.0
Episode 422: steps = 8 , reward = 1.0
Episode 423: steps = 6 , reward = 1.0
Episode 424: steps = 6 , reward = 1.0
Episode 425: steps = 8 , reward = 1.0
Episode 426: steps = 4 , reward = 0.0
Episode 427: steps = 6 , reward = 1.0
Episode 428: steps = 6 , reward = 1.0
Episode 429: steps = 6 , reward = 1.0
Episode 430: steps = 10 , reward = 1.0
Episode 431: steps = 6 , reward = 1.0
Episode 432: steps = 6 , reward = 1.0
Episode 433: steps = 7 , reward = 1.0
Episode 434: steps = 6 , reward = 1.0
Episode 435: steps = 5 , reward = 0.0
Episode 436: steps = 6 , reward = 1.0
Episode 437: steps = 6 , reward = 1.0
Episode 438: steps = 6 , reward = 1.0
Episode 439: steps = 6 , reward = 1.0
Episode 440: steps = 6 , reward = 1.0
Episode 441: steps = 6 , reward = 1.0
Episode 442: steps = 8 , reward = 1.0
Episode 443: steps = 6 , reward = 1.0
Episode 444: steps = 8 , reward = 1.0
Episode 445: steps = 6 , reward = 1.0
Episode 446: steps = 6 , reward = 1.0
Episode 447: steps = 6 , reward = 1.0
Episode 448: steps = 6 , reward = 1.0
Episode 449: steps = 6 , reward = 1.0
Episode 450: steps = 2 , reward = 0.0
Episode 451: steps = 6 , reward = 1.0
Episode 452: steps = 8 , reward = 1.0
Episode 453: steps = 6 , reward = 1.0
Episode 454: steps = 6 , reward = 1.0
Episode 455: steps = 3 , reward = 0.0
Episode 456: steps = 4 , reward = 0.0
Episode 457: steps = 6 , reward = 1.0
Episode 458: steps = 6 , reward = 1.0
Episode 459: steps = 6 , reward = 1.0
Episode 460: steps = 5 , reward = 0.0
Episode 461: steps = 4 , reward = 0.0
Episode 462: steps = 8 , reward = 1.0
Episode 463: steps = 8 , reward = 1.0
Episode 464: steps = 3 , reward = 0.0
Episode 465: steps = 6 , reward = 1.0
Episode 466: steps = 7 , reward = 1.0
Episode 467: steps = 6 , reward = 1.0
Episode 468: steps = 6 , reward = 1.0
Episode 469: steps = 6 , reward = 1.0
Episode 470: steps = 6 , reward = 1.0
Episode 471: steps = 6 , reward = 1.0
Episode 472: steps = 6 , reward = 0.0
Episode 473: steps = 6 , reward = 1.0
Episode 474: steps = 6 , reward = 1.0
Episode 475: steps = 6 , reward = 1.0
Episode 476: steps = 6 , reward = 1.0
Episode 477: steps = 6 , reward = 1.0
Episode 478: steps = 6 , reward = 1.0
Episode 479: steps = 6 , reward = 1.0
Episode 480: steps = 6 , reward = 1.0
Episode 481: steps = 10 , reward = 1.0
Episode 482: steps = 6 , reward = 1.0
Episode 483: steps = 6 , reward = 1.0
Episode 484: steps = 8 , reward = 1.0
Episode 485: steps = 5 , reward = 0.0
Episode 486: steps = 7 , reward = 1.0
Episode 487: steps = 6 , reward = 1.0
Episode 488: steps = 6 , reward = 1.0
Episode 489: steps = 9 , reward = 1.0
Episode 490: steps = 6 , reward = 1.0
Episode 491: steps = 6 , reward = 1.0
Episode 492: steps = 7 , reward = 1.0
Episode 493: steps = 8 , reward = 1.0
Episode 494: steps = 6 , reward = 1.0
Episode 495: steps = 6 , reward = 1.0
Episode 496: steps = 2 , reward = 0.0
Episode 497: steps = 6 , reward = 1.0
Episode 498: steps = 8 , reward = 1.0
Episode 499: steps = 6 , reward = 1.0
test reward = 1.0

五、总结

Sarsa选取的是一种保守的策略,他在更新Q值的时候已经为未来规划好了动作,对错误和死亡比较敏感。而Q-learning每次在更新的时候选取的是最大化Q的方向,而当下一个状态时,再重新选择动作,Q-learning是一种鲁莽、大胆、贪婪的算法,对于死亡和错误并不在乎。

简单来说Sarsa更加保守,Q-learning更加激进。

!pip install gym
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Requirement already satisfied: gym in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (0.12.1)
Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gym) (1.3.0)
Requirement already satisfied: pyglet>=1.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gym) (1.4.5)
Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gym) (1.15.0)
Requirement already satisfied: requests>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gym) (2.22.0)
Requirement already satisfied: numpy>=1.10.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gym) (1.16.4)
Requirement already satisfied: future in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pyglet>=1.2.0->gym) (0.18.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests>=2.0->gym) (2019.9.11)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests>=2.0->gym) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests>=2.0->gym) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests>=2.0->gym) (1.25.6)
import gym
import numpy as np
import time

class SarsaAgent(object):
    def __init__(self, obs_n, act_n, learning_rate=0.01, gamma=0.9, e_greed=0.1):
        self.act_n = act_n      # 动作维度,有几个动作可选
        self.lr = learning_rate # 学习率
        self.gamma = gamma      # reward的衰减率
        self.epsilon = e_greed  # 按一定概率随机选动作
        self.Q = np.zeros((obs_n, act_n))

    # 根据输入观察值,采样输出的动作值,带探索
    def sample(self, obs):
       
        if np.random.uniform(0, 1) < (1.0 - self.epsilon): #根据table的Q值选动作
            action = self.predict(obs)
        else:
            action = np.random.choice(self.act_n) #有一定概率随机探索选取一个动作
    
        return action

    # 根据输入观察值,预测输出的动作值
    def predict(self, obs):
       
        Q_list = self.Q[obs, :]
        maxQ = np.max(Q_list)
        action_list = np.where(Q_list == maxQ)[0]  # maxQ可能对应多个action
        action = np.random.choice(action_list)

        return action

    # 学习方法,也就是更新Q-table的方法
    def learn(self, obs, action, reward, next_obs, next_action, done):
        """ on-policy
            obs: 交互前的obs, s_t
            action: 本次交互选择的action, a_t
            reward: 本次动作获得的奖励r
            next_obs: 本次交互后的obs, s_t+1
            next_action: 根据当前Q表格, 针对next_obs会选择的动作, a_t+1
            done: episode是否结束
        """
        predict_Q = self.Q[obs, action]
        if done:
            target_Q = reward # 没有下一个状态了
        else:
            target_Q = reward + self.gamma * self.Q[next_obs, next_action] # Sarsa
        self.Q[obs, action] += self.lr * (target_Q - predict_Q) # 修正q

    # 保存Q表格数据到文件
    def save(self):
        npy_file = './q_table.npy'
        np.save(npy_file, self.Q)
        print(npy_file + ' saved.')
    
    # 从文件中读取数据到Q表格中
    def restore(self, npy_file='./q_table.npy'):
        self.Q = np.load(npy_file)
        print(npy_file + ' loaded.')


def run_episode(env, agent, render=False):
    total_steps = 0 # 记录每个episode走了多少step
    total_reward = 0

    obs = env.reset() # 重置环境, 重新开一局(即开始新的一个episode)
    action = agent.sample(obs) # 根据算法选择一个动作

    while True:
        next_obs, reward, done, _ = env.step(action) # 与环境进行一个交互
        next_action = agent.sample(next_obs) # 根据算法选择一个动作
        # 训练 Sarsa 算法
        agent.learn(obs, action, reward, next_obs, next_action, done)

        action = next_action
        obs = next_obs  # 存储上一个观察值
        total_reward += reward
        total_steps += 1 # 计算step数
        if render:
            env.render() #渲染新的一帧图形
        if done:
            break
    return total_reward, total_steps


def test_episode(env, agent):
    total_reward = 0
    obs = env.reset()
    while True:
        action = agent.predict(obs) # greedy
        next_obs, reward, done, _ = env.step(action)
        total_reward += reward
        obs = next_obs
        time.sleep(0.5)
        env.render()
        if done:
            break
    return total_reward


# 使用gym创建迷宫环境,设置is_slippery为False降低环境难度
env = gym.make("FrozenLake-v0", is_slippery=False)  # 0 left, 1 down, 2 right, 3 up

# 创建一个agent实例,输入超参数
agent = SarsaAgent(
        obs_n=env.observation_space.n,
        act_n=env.action_space.n,
        learning_rate=0.1,
        gamma=0.9,
        e_greed=0.1)


# 训练500个episode,打印每个episode的分数
for episode in range(500):
    ep_reward, ep_steps = run_episode(env, agent, False)
    print('Episode %s: steps = %s , reward = %.1f' % (episode, ep_steps, ep_reward))

# 全部训练结束,查看算法效果
test_reward = test_episode(env, agent)
print('test reward = %.1f' % (test_reward))
Episode 0: steps = 11 , reward = 0.0
Episode 1: steps = 10 , reward = 0.0
Episode 2: steps = 5 , reward = 0.0
Episode 3: steps = 13 , reward = 0.0
Episode 4: steps = 6 , reward = 0.0
Episode 5: steps = 7 , reward = 0.0
Episode 6: steps = 5 , reward = 0.0
Episode 7: steps = 4 , reward = 0.0
Episode 8: steps = 2 , reward = 0.0
Episode 9: steps = 20 , reward = 0.0
Episode 10: steps = 8 , reward = 0.0
Episode 11: steps = 15 , reward = 0.0
Episode 12: steps = 2 , reward = 0.0
Episode 13: steps = 8 , reward = 0.0
Episode 14: steps = 10 , reward = 0.0
Episode 15: steps = 10 , reward = 0.0
Episode 16: steps = 2 , reward = 0.0
Episode 17: steps = 4 , reward = 0.0
Episode 18: steps = 2 , reward = 0.0
Episode 19: steps = 5 , reward = 0.0
Episode 20: steps = 2 , reward = 0.0
Episode 21: steps = 5 , reward = 0.0
Episode 22: steps = 11 , reward = 0.0
Episode 23: steps = 9 , reward = 0.0
Episode 24: steps = 7 , reward = 0.0
Episode 25: steps = 8 , reward = 0.0
Episode 26: steps = 13 , reward = 0.0
Episode 27: steps = 5 , reward = 0.0
Episode 28: steps = 5 , reward = 0.0
Episode 29: steps = 2 , reward = 0.0
Episode 30: steps = 6 , reward = 0.0
Episode 31: steps = 4 , reward = 0.0
Episode 32: steps = 11 , reward = 0.0
Episode 33: steps = 6 , reward = 0.0
Episode 34: steps = 3 , reward = 0.0
Episode 35: steps = 3 , reward = 0.0
Episode 36: steps = 11 , reward = 0.0
Episode 37: steps = 31 , reward = 0.0
Episode 38: steps = 9 , reward = 0.0
Episode 39: steps = 3 , reward = 0.0
Episode 40: steps = 17 , reward = 0.0
Episode 41: steps = 6 , reward = 0.0
Episode 42: steps = 5 , reward = 0.0
Episode 43: steps = 4 , reward = 0.0
Episode 44: steps = 3 , reward = 0.0
Episode 45: steps = 7 , reward = 0.0
Episode 46: steps = 4 , reward = 0.0
Episode 47: steps = 5 , reward = 0.0
Episode 48: steps = 3 , reward = 0.0
Episode 49: steps = 7 , reward = 0.0
Episode 50: steps = 7 , reward = 0.0
Episode 51: steps = 4 , reward = 0.0
Episode 52: steps = 4 , reward = 0.0
Episode 53: steps = 6 , reward = 0.0
Episode 54: steps = 3 , reward = 0.0
Episode 55: steps = 2 , reward = 0.0
Episode 56: steps = 9 , reward = 0.0
Episode 57: steps = 3 , reward = 0.0
Episode 58: steps = 6 , reward = 0.0
Episode 59: steps = 24 , reward = 0.0
Episode 60: steps = 12 , reward = 0.0
Episode 61: steps = 8 , reward = 0.0
Episode 62: steps = 10 , reward = 0.0
Episode 63: steps = 15 , reward = 0.0
Episode 64: steps = 10 , reward = 0.0
Episode 65: steps = 5 , reward = 0.0
Episode 66: steps = 12 , reward = 0.0
Episode 67: steps = 8 , reward = 0.0
Episode 68: steps = 5 , reward = 0.0
Episode 69: steps = 7 , reward = 0.0
Episode 70: steps = 2 , reward = 0.0
Episode 71: steps = 11 , reward = 0.0
Episode 72: steps = 8 , reward = 0.0
Episode 73: steps = 3 , reward = 0.0
Episode 74: steps = 6 , reward = 0.0
Episode 75: steps = 16 , reward = 0.0
Episode 76: steps = 4 , reward = 0.0
Episode 77: steps = 2 , reward = 0.0
Episode 78: steps = 9 , reward = 0.0
Episode 79: steps = 7 , reward = 0.0
Episode 80: steps = 4 , reward = 0.0
Episode 81: steps = 6 , reward = 0.0
Episode 82: steps = 21 , reward = 0.0
Episode 83: steps = 4 , reward = 0.0
Episode 84: steps = 2 , reward = 0.0
Episode 85: steps = 15 , reward = 0.0
Episode 86: steps = 13 , reward = 0.0
Episode 87: steps = 3 , reward = 0.0
Episode 88: steps = 39 , reward = 0.0
Episode 89: steps = 14 , reward = 0.0
Episode 90: steps = 4 , reward = 0.0
Episode 91: steps = 6 , reward = 0.0
Episode 92: steps = 2 , reward = 0.0
Episode 93: steps = 2 , reward = 0.0
Episode 94: steps = 2 , reward = 0.0
Episode 95: steps = 15 , reward = 0.0
Episode 96: steps = 2 , reward = 0.0
Episode 97: steps = 12 , reward = 1.0
Episode 98: steps = 2 , reward = 0.0
Episode 99: steps = 4 , reward = 0.0
Episode 100: steps = 9 , reward = 0.0
Episode 101: steps = 12 , reward = 0.0
Episode 102: steps = 26 , reward = 0.0
Episode 103: steps = 4 , reward = 0.0
Episode 104: steps = 23 , reward = 0.0
Episode 105: steps = 5 , reward = 0.0
Episode 106: steps = 2 , reward = 0.0
Episode 107: steps = 5 , reward = 0.0
Episode 108: steps = 2 , reward = 0.0
Episode 109: steps = 3 , reward = 0.0
Episode 110: steps = 7 , reward = 0.0
Episode 111: steps = 15 , reward = 0.0
Episode 112: steps = 13 , reward = 0.0
Episode 113: steps = 14 , reward = 0.0
Episode 114: steps = 16 , reward = 0.0
Episode 115: steps = 3 , reward = 0.0
Episode 116: steps = 4 , reward = 0.0
Episode 117: steps = 5 , reward = 0.0
Episode 118: steps = 4 , reward = 0.0
Episode 119: steps = 11 , reward = 0.0
Episode 120: steps = 8 , reward = 0.0
Episode 121: steps = 8 , reward = 0.0
Episode 122: steps = 2 , reward = 0.0
Episode 123: steps = 3 , reward = 0.0
Episode 124: steps = 4 , reward = 0.0
Episode 125: steps = 2 , reward = 0.0
Episode 126: steps = 9 , reward = 0.0
Episode 127: steps = 10 , reward = 0.0
Episode 128: steps = 8 , reward = 0.0
Episode 129: steps = 3 , reward = 0.0
Episode 130: steps = 19 , reward = 0.0
Episode 131: steps = 7 , reward = 0.0
Episode 132: steps = 4 , reward = 0.0
Episode 133: steps = 19 , reward = 0.0
Episode 134: steps = 10 , reward = 0.0
Episode 135: steps = 11 , reward = 0.0
Episode 136: steps = 8 , reward = 0.0
Episode 137: steps = 6 , reward = 0.0
Episode 138: steps = 5 , reward = 0.0
Episode 139: steps = 11 , reward = 0.0
Episode 140: steps = 13 , reward = 0.0
Episode 141: steps = 10 , reward = 0.0
Episode 142: steps = 3 , reward = 0.0
Episode 143: steps = 5 , reward = 0.0
Episode 144: steps = 3 , reward = 0.0
Episode 145: steps = 4 , reward = 0.0
Episode 146: steps = 7 , reward = 0.0
Episode 147: steps = 21 , reward = 0.0
Episode 148: steps = 19 , reward = 0.0
Episode 149: steps = 11 , reward = 0.0
Episode 150: steps = 9 , reward = 0.0
Episode 151: steps = 7 , reward = 0.0
Episode 152: steps = 5 , reward = 0.0
Episode 153: steps = 7 , reward = 0.0
Episode 154: steps = 2 , reward = 0.0
Episode 155: steps = 2 , reward = 0.0
Episode 156: steps = 7 , reward = 0.0
Episode 157: steps = 10 , reward = 0.0
Episode 158: steps = 3 , reward = 0.0
Episode 159: steps = 3 , reward = 0.0
Episode 160: steps = 5 , reward = 0.0
Episode 161: steps = 11 , reward = 1.0
Episode 162: steps = 8 , reward = 1.0
Episode 163: steps = 10 , reward = 0.0
Episode 164: steps = 17 , reward = 1.0
Episode 165: steps = 4 , reward = 0.0
Episode 166: steps = 4 , reward = 0.0
Episode 167: steps = 7 , reward = 1.0
Episode 168: steps = 10 , reward = 1.0
Episode 169: steps = 8 , reward = 1.0
Episode 170: steps = 6 , reward = 1.0
Episode 171: steps = 7 , reward = 1.0
Episode 172: steps = 6 , reward = 1.0
Episode 173: steps = 4 , reward = 0.0
Episode 174: steps = 6 , reward = 1.0
Episode 175: steps = 6 , reward = 1.0
Episode 176: steps = 6 , reward = 1.0
Episode 177: steps = 8 , reward = 0.0
Episode 178: steps = 6 , reward = 1.0
Episode 179: steps = 6 , reward = 1.0
Episode 180: steps = 6 , reward = 1.0
Episode 181: steps = 8 , reward = 0.0
Episode 182: steps = 8 , reward = 1.0
Episode 183: steps = 6 , reward = 1.0
Episode 184: steps = 6 , reward = 1.0
Episode 185: steps = 6 , reward = 1.0
Episode 186: steps = 6 , reward = 1.0
Episode 187: steps = 6 , reward = 1.0
Episode 188: steps = 9 , reward = 0.0
Episode 189: steps = 8 , reward = 1.0
Episode 190: steps = 6 , reward = 1.0
Episode 191: steps = 6 , reward = 1.0
Episode 192: steps = 6 , reward = 1.0
Episode 193: steps = 8 , reward = 1.0
Episode 194: steps = 6 , reward = 1.0
Episode 195: steps = 6 , reward = 1.0
Episode 196: steps = 6 , reward = 1.0
Episode 197: steps = 8 , reward = 1.0
Episode 198: steps = 6 , reward = 0.0
Episode 199: steps = 6 , reward = 1.0
Episode 200: steps = 5 , reward = 0.0
Episode 201: steps = 5 , reward = 0.0
Episode 202: steps = 6 , reward = 1.0
Episode 203: steps = 8 , reward = 1.0
Episode 204: steps = 8 , reward = 1.0
Episode 205: steps = 8 , reward = 1.0
Episode 206: steps = 2 , reward = 0.0
Episode 207: steps = 6 , reward = 1.0
Episode 208: steps = 6 , reward = 1.0
Episode 209: steps = 5 , reward = 0.0
Episode 210: steps = 9 , reward = 1.0
Episode 211: steps = 7 , reward = 0.0
Episode 212: steps = 6 , reward = 1.0
Episode 213: steps = 6 , reward = 1.0
Episode 214: steps = 9 , reward = 1.0
Episode 215: steps = 6 , reward = 1.0
Episode 216: steps = 8 , reward = 1.0
Episode 217: steps = 6 , reward = 1.0
Episode 218: steps = 8 , reward = 1.0
Episode 219: steps = 6 , reward = 1.0
Episode 220: steps = 4 , reward = 0.0
Episode 221: steps = 6 , reward = 1.0
Episode 222: steps = 6 , reward = 1.0
Episode 223: steps = 2 , reward = 0.0
Episode 224: steps = 6 , reward = 1.0
Episode 225: steps = 7 , reward = 1.0
Episode 226: steps = 6 , reward = 1.0
Episode 227: steps = 6 , reward = 1.0
Episode 228: steps = 6 , reward = 1.0
Episode 229: steps = 6 , reward = 1.0
Episode 230: steps = 6 , reward = 1.0
Episode 231: steps = 6 , reward = 1.0
Episode 232: steps = 6 , reward = 1.0
Episode 233: steps = 6 , reward = 1.0
Episode 234: steps = 6 , reward = 1.0
Episode 235: steps = 6 , reward = 1.0
Episode 236: steps = 7 , reward = 1.0
Episode 237: steps = 7 , reward = 1.0
Episode 238: steps = 6 , reward = 1.0
Episode 239: steps = 4 , reward = 0.0
Episode 240: steps = 4 , reward = 0.0
Episode 241: steps = 4 , reward = 0.0
Episode 242: steps = 2 , reward = 0.0
Episode 243: steps = 7 , reward = 1.0
Episode 244: steps = 6 , reward = 1.0
Episode 245: steps = 6 , reward = 1.0
Episode 246: steps = 4 , reward = 0.0
Episode 247: steps = 6 , reward = 1.0
Episode 248: steps = 6 , reward = 1.0
Episode 249: steps = 6 , reward = 1.0
Episode 250: steps = 4 , reward = 0.0
Episode 251: steps = 6 , reward = 1.0
Episode 252: steps = 6 , reward = 1.0
Episode 253: steps = 6 , reward = 1.0
Episode 254: steps = 8 , reward = 1.0
Episode 255: steps = 6 , reward = 1.0
Episode 256: steps = 9 , reward = 1.0
Episode 257: steps = 6 , reward = 1.0
Episode 258: steps = 8 , reward = 1.0
Episode 259: steps = 7 , reward = 1.0
Episode 260: steps = 6 , reward = 1.0
Episode 261: steps = 6 , reward = 1.0
Episode 262: steps = 6 , reward = 1.0
Episode 263: steps = 8 , reward = 1.0
Episode 264: steps = 6 , reward = 1.0
Episode 265: steps = 8 , reward = 1.0
Episode 266: steps = 6 , reward = 1.0
Episode 267: steps = 6 , reward = 1.0
Episode 268: steps = 4 , reward = 0.0
Episode 269: steps = 8 , reward = 0.0
Episode 270: steps = 6 , reward = 1.0
Episode 271: steps = 6 , reward = 1.0
Episode 272: steps = 8 , reward = 1.0
Episode 273: steps = 6 , reward = 1.0
Episode 274: steps = 7 , reward = 1.0
Episode 275: steps = 6 , reward = 1.0
Episode 276: steps = 4 , reward = 0.0
Episode 277: steps = 6 , reward = 1.0
Episode 278: steps = 6 , reward = 1.0
Episode 279: steps = 6 , reward = 1.0
Episode 280: steps = 4 , reward = 0.0
Episode 281: steps = 6 , reward = 1.0
Episode 282: steps = 6 , reward = 1.0
Episode 283: steps = 4 , reward = 0.0
Episode 284: steps = 6 , reward = 1.0
Episode 285: steps = 7 , reward = 1.0
Episode 286: steps = 9 , reward = 0.0
Episode 287: steps = 6 , reward = 1.0
Episode 288: steps = 6 , reward = 1.0
Episode 289: steps = 8 , reward = 1.0
Episode 290: steps = 7 , reward = 1.0
Episode 291: steps = 7 , reward = 1.0
Episode 292: steps = 6 , reward = 1.0
Episode 293: steps = 6 , reward = 1.0
Episode 294: steps = 8 , reward = 1.0
Episode 295: steps = 6 , reward = 1.0
Episode 296: steps = 6 , reward = 1.0
Episode 297: steps = 6 , reward = 1.0
Episode 298: steps = 6 , reward = 1.0
Episode 299: steps = 10 , reward = 0.0
Episode 300: steps = 8 , reward = 1.0
Episode 301: steps = 4 , reward = 0.0
Episode 302: steps = 8 , reward = 1.0
Episode 303: steps = 7 , reward = 1.0
Episode 304: steps = 6 , reward = 1.0
Episode 305: steps = 6 , reward = 1.0
Episode 306: steps = 6 , reward = 1.0
Episode 307: steps = 4 , reward = 0.0
Episode 308: steps = 6 , reward = 1.0
Episode 309: steps = 6 , reward = 1.0
Episode 310: steps = 5 , reward = 0.0
Episode 311: steps = 6 , reward = 1.0
Episode 312: steps = 8 , reward = 0.0
Episode 313: steps = 6 , reward = 1.0
Episode 314: steps = 6 , reward = 1.0
Episode 315: steps = 6 , reward = 1.0
Episode 316: steps = 6 , reward = 1.0
Episode 317: steps = 6 , reward = 1.0
Episode 318: steps = 7 , reward = 1.0
Episode 319: steps = 6 , reward = 1.0
Episode 320: steps = 6 , reward = 1.0
Episode 321: steps = 6 , reward = 1.0
Episode 322: steps = 6 , reward = 1.0
Episode 323: steps = 6 , reward = 1.0
Episode 324: steps = 10 , reward = 1.0
Episode 325: steps = 6 , reward = 1.0
Episode 326: steps = 6 , reward = 1.0
Episode 327: steps = 6 , reward = 1.0
Episode 328: steps = 6 , reward = 1.0
Episode 329: steps = 6 , reward = 1.0
Episode 330: steps = 6 , reward = 1.0
Episode 331: steps = 8 , reward = 1.0
Episode 332: steps = 6 , reward = 1.0
Episode 333: steps = 5 , reward = 0.0
Episode 334: steps = 5 , reward = 0.0
Episode 335: steps = 8 , reward = 1.0
Episode 336: steps = 6 , reward = 1.0
Episode 337: steps = 6 , reward = 1.0
Episode 338: steps = 6 , reward = 1.0
Episode 339: steps = 6 , reward = 1.0
Episode 340: steps = 6 , reward = 1.0
Episode 341: steps = 6 , reward = 1.0
Episode 342: steps = 6 , reward = 1.0
Episode 343: steps = 6 , reward = 1.0
Episode 344: steps = 6 , reward = 1.0
Episode 345: steps = 6 , reward = 1.0
Episode 346: steps = 6 , reward = 1.0
Episode 347: steps = 6 , reward = 1.0
Episode 348: steps = 7 , reward = 0.0
Episode 349: steps = 6 , reward = 1.0
Episode 350: steps = 6 , reward = 1.0
Episode 351: steps = 8 , reward = 1.0
Episode 352: steps = 7 , reward = 1.0
Episode 353: steps = 8 , reward = 1.0
Episode 354: steps = 6 , reward = 1.0
Episode 355: steps = 6 , reward = 1.0
Episode 356: steps = 8 , reward = 1.0
Episode 357: steps = 8 , reward = 1.0
Episode 358: steps = 6 , reward = 1.0
Episode 359: steps = 6 , reward = 1.0
Episode 360: steps = 6 , reward = 1.0
Episode 361: steps = 9 , reward = 1.0
Episode 362: steps = 6 , reward = 1.0
Episode 363: steps = 6 , reward = 1.0
Episode 364: steps = 7 , reward = 1.0
Episode 365: steps = 6 , reward = 1.0
Episode 366: steps = 8 , reward = 1.0
Episode 367: steps = 6 , reward = 1.0
Episode 368: steps = 2 , reward = 0.0
Episode 369: steps = 6 , reward = 1.0
Episode 370: steps = 6 , reward = 1.0
Episode 371: steps = 6 , reward = 0.0
Episode 372: steps = 6 , reward = 1.0
Episode 373: steps = 8 , reward = 1.0
Episode 374: steps = 6 , reward = 1.0
Episode 375: steps = 6 , reward = 1.0
Episode 376: steps = 6 , reward = 1.0
Episode 377: steps = 6 , reward = 1.0
Episode 378: steps = 6 , reward = 1.0
Episode 379: steps = 6 , reward = 1.0
Episode 380: steps = 6 , reward = 1.0
Episode 381: steps = 6 , reward = 1.0
Episode 382: steps = 6 , reward = 1.0
Episode 383: steps = 6 , reward = 1.0
Episode 384: steps = 6 , reward = 1.0
Episode 385: steps = 6 , reward = 1.0
Episode 386: steps = 6 , reward = 1.0
Episode 387: steps = 6 , reward = 1.0
Episode 388: steps = 6 , reward = 1.0
Episode 389: steps = 6 , reward = 1.0
Episode 390: steps = 8 , reward = 1.0
Episode 391: steps = 6 , reward = 1.0
Episode 392: steps = 2 , reward = 0.0
Episode 393: steps = 6 , reward = 1.0
Episode 394: steps = 6 , reward = 1.0
Episode 395: steps = 6 , reward = 1.0
Episode 396: steps = 7 , reward = 1.0
Episode 397: steps = 6 , reward = 1.0
Episode 398: steps = 8 , reward = 1.0
Episode 399: steps = 8 , reward = 1.0
Episode 400: steps = 6 , reward = 1.0
Episode 401: steps = 6 , reward = 1.0
Episode 402: steps = 8 , reward = 1.0
Episode 403: steps = 9 , reward = 1.0
Episode 404: steps = 8 , reward = 1.0
Episode 405: steps = 6 , reward = 1.0
Episode 406: steps = 6 , reward = 1.0
Episode 407: steps = 8 , reward = 1.0
Episode 408: steps = 6 , reward = 1.0
Episode 409: steps = 6 , reward = 1.0
Episode 410: steps = 4 , reward = 0.0
Episode 411: steps = 6 , reward = 1.0
Episode 412: steps = 6 , reward = 1.0
Episode 413: steps = 2 , reward = 0.0
Episode 414: steps = 6 , reward = 1.0
Episode 415: steps = 6 , reward = 1.0
Episode 416: steps = 6 , reward = 1.0
Episode 417: steps = 10 , reward = 1.0
Episode 418: steps = 6 , reward = 1.0
Episode 419: steps = 6 , reward = 1.0
Episode 420: steps = 6 , reward = 1.0
Episode 421: steps = 5 , reward = 0.0
Episode 422: steps = 6 , reward = 1.0
Episode 423: steps = 8 , reward = 1.0
Episode 424: steps = 6 , reward = 1.0
Episode 425: steps = 6 , reward = 1.0
Episode 426: steps = 6 , reward = 1.0
Episode 427: steps = 6 , reward = 1.0
Episode 428: steps = 6 , reward = 1.0
Episode 429: steps = 10 , reward = 1.0
Episode 430: steps = 4 , reward = 0.0
Episode 431: steps = 2 , reward = 0.0
Episode 432: steps = 6 , reward = 1.0
Episode 433: steps = 8 , reward = 1.0
Episode 434: steps = 6 , reward = 1.0
Episode 435: steps = 4 , reward = 0.0
Episode 436: steps = 6 , reward = 1.0
Episode 437: steps = 6 , reward = 1.0
Episode 438: steps = 4 , reward = 0.0
Episode 439: steps = 6 , reward = 1.0
Episode 440: steps = 6 , reward = 1.0
Episode 441: steps = 6 , reward = 1.0
Episode 442: steps = 6 , reward = 1.0
Episode 443: steps = 6 , reward = 1.0
Episode 444: steps = 6 , reward = 1.0
Episode 445: steps = 6 , reward = 1.0
Episode 446: steps = 8 , reward = 1.0
Episode 447: steps = 5 , reward = 0.0
Episode 448: steps = 6 , reward = 1.0
Episode 449: steps = 6 , reward = 1.0
Episode 450: steps = 8 , reward = 1.0
Episode 451: steps = 5 , reward = 0.0
Episode 452: steps = 6 , reward = 1.0
Episode 453: steps = 6 , reward = 1.0
Episode 454: steps = 10 , reward = 1.0
Episode 455: steps = 6 , reward = 1.0
Episode 456: steps = 6 , reward = 0.0
Episode 457: steps = 6 , reward = 1.0
Episode 458: steps = 6 , reward = 1.0
Episode 459: steps = 6 , reward = 1.0
Episode 460: steps = 8 , reward = 1.0
Episode 461: steps = 8 , reward = 1.0
Episode 462: steps = 6 , reward = 1.0
Episode 463: steps = 8 , reward = 1.0
Episode 464: steps = 6 , reward = 1.0
Episode 465: steps = 2 , reward = 0.0
Episode 466: steps = 6 , reward = 1.0
Episode 467: steps = 6 , reward = 1.0
Episode 468: steps = 6 , reward = 1.0
Episode 469: steps = 8 , reward = 1.0
Episode 470: steps = 4 , reward = 0.0
Episode 471: steps = 6 , reward = 1.0
Episode 472: steps = 6 , reward = 1.0
Episode 473: steps = 8 , reward = 1.0
Episode 474: steps = 6 , reward = 1.0
Episode 475: steps = 4 , reward = 0.0
Episode 476: steps = 6 , reward = 1.0
Episode 477: steps = 6 , reward = 1.0
Episode 478: steps = 4 , reward = 0.0
Episode 479: steps = 4 , reward = 0.0
Episode 480: steps = 6 , reward = 1.0
Episode 481: steps = 6 , reward = 1.0
Episode 482: steps = 8 , reward = 1.0
Episode 483: steps = 8 , reward = 1.0
Episode 484: steps = 7 , reward = 1.0
Episode 485: steps = 8 , reward = 1.0
Episode 486: steps = 6 , reward = 1.0
Episode 487: steps = 6 , reward = 1.0
Episode 488: steps = 6 , reward = 1.0
Episode 489: steps = 6 , reward = 1.0
Episode 490: steps = 6 , reward = 1.0
Episode 491: steps = 6 , reward = 1.0
Episode 492: steps = 10 , reward = 1.0
Episode 493: steps = 6 , reward = 1.0
Episode 494: steps = 7 , reward = 1.0
Episode 495: steps = 6 , reward = 1.0
Episode 496: steps = 6 , reward = 1.0
Episode 497: steps = 6 , reward = 1.0
Episode 498: steps = 5 , reward = 0.0
Episode 499: steps = 4 , reward = 0.0
  (Right)
S[41mF[0mFF
FHFH
FFFH
HFFG
  (Right)
SF[41mF[0mF
FHFH
FFFH
HFFG
  (Down)
SFFF
FH[41mF[0mH
FFFH
HFFG
  (Down)
SFFF
FHFH
FF[41mF[0mH
HFFG
  (Down)
SFFF
FHFH
FFFH
HF[41mF[0mG
  (Right)
SFFF
FHFH
FFFH
HFF[41mG[0m
test reward = 1.0
import gym
import numpy as np
import time


class QLearningAgent(object):
    def __init__(self, obs_n, act_n, learning_rate=0.01, gamma=0.9, e_greed=0.1):
        self.act_n = act_n      # 动作维度,有几个动作可选
        self.lr = learning_rate # 学习率
        self.gamma = gamma      # reward的衰减率
        self.epsilon = e_greed  # 按一定概率随机选动作
        self.Q = np.zeros((obs_n, act_n))

    # 根据输入观察值,采样输出的动作值,带探索
    def sample(self, obs):

        if np.random.uniform(0, 1) < (1.0 - self.epsilon): #根据table的Q值选动作
            action = self.predict(obs)
        else:
            action = np.random.choice(self.act_n) #有一定概率随机探索选取一个动作
        return action

    # 根据输入观察值,预测输出的动作值
    def predict(self, obs):
       
        Q_list = self.Q[obs, :]
        maxQ = np.max(Q_list)
        action_list = np.where(Q_list == maxQ)[0]  # maxQ可能对应多个action
        action = np.random.choice(action_list)
        return action

    # 学习方法,也就是更新Q-table的方法
    def learn(self, obs, action, reward, next_obs, done):
        """ off-policy
            obs: 交互前的obs, s_t
            action: 本次交互选择的action, a_t
            reward: 本次动作获得的奖励r
            next_obs: 本次交互后的obs, s_t+1
            done: episode是否结束
        """

        predict_Q = self.Q[obs, action]
        if done:
            target_Q = reward # 没有下一个状态了
        else:
            target_Q = reward + self.gamma * np.max(self.Q[next_obs, :]) # Q-learning
        self.Q[obs, action] += self.lr * (target_Q - predict_Q) # 修正q

    # 保存Q表格数据到文件
    def save(self):
        npy_file = './q_table.npy'
        np.save(npy_file, self.Q)
        print(npy_file + ' saved.')
    
    # 从文件中读取数据到Q表格中
    def restore(self, npy_file='./q_table.npy'):
        self.Q = np.load(npy_file)
        print(npy_file + ' loaded.')


def run_episode(env, agent, render=False):
    total_steps = 0 # 记录每个episode走了多少step
    total_reward = 0

    obs = env.reset() # 重置环境, 重新开一局(即开始新的一个episode)

    while True:
        action = agent.sample(obs) # 根据算法选择一个动作
        next_obs, reward, done, _ = env.step(action) # 与环境进行一个交互
        # 训练 Q-learning算法
        agent.learn(obs, action, reward, next_obs, done)

        obs = next_obs  # 存储上一个观察值
        total_reward += reward
        total_steps += 1 # 计算step数
        if render:
            env.render() #渲染新的一帧图形
        if done:
            break
    return total_reward, total_steps


def test_episode(env, agent):
    total_reward = 0
    obs = env.reset()
    while True:
        action = agent.predict(obs) # greedy
        next_obs, reward, done, _ = env.step(action)
        total_reward += reward
        obs = next_obs
        # time.sleep(0.5)
        # env.render()
        if done:
            break
    return total_reward



# 使用gym创建迷宫环境,设置is_slippery为False降低环境难度
env = gym.make("FrozenLake-v0", is_slippery=False)  # 0 left, 1 down, 2 right, 3 up

# 创建一个agent实例,输入超参数
agent = QLearningAgent(
        obs_n=env.observation_space.n,
        act_n=env.action_space.n,
        learning_rate=0.1,
        gamma=0.9,
        e_greed=0.1)


# 训练500个episode,打印每个episode的分数
for episode in range(500):
    ep_reward, ep_steps = run_episode(env, agent, False)
    print('Episode %s: steps = %s , reward = %.1f' % (episode, ep_steps, ep_reward))

# 全部训练结束,查看算法效果
test_reward = test_episode(env, agent)
print('test reward = %.1f' % (test_reward))
Episode 0: steps = 2 , reward = 0.0
Episode 1: steps = 20 , reward = 0.0
Episode 2: steps = 8 , reward = 0.0
Episode 3: steps = 7 , reward = 0.0
Episode 4: steps = 6 , reward = 0.0
Episode 5: steps = 2 , reward = 0.0
Episode 6: steps = 15 , reward = 0.0
Episode 7: steps = 2 , reward = 0.0
Episode 8: steps = 2 , reward = 0.0
Episode 9: steps = 12 , reward = 0.0
Episode 10: steps = 14 , reward = 0.0
Episode 11: steps = 3 , reward = 0.0
Episode 12: steps = 5 , reward = 0.0
Episode 13: steps = 9 , reward = 0.0
Episode 14: steps = 6 , reward = 0.0
Episode 15: steps = 6 , reward = 0.0
Episode 16: steps = 15 , reward = 0.0
Episode 17: steps = 11 , reward = 0.0
Episode 18: steps = 10 , reward = 0.0
Episode 19: steps = 12 , reward = 0.0
Episode 20: steps = 7 , reward = 0.0
Episode 21: steps = 6 , reward = 0.0
Episode 22: steps = 5 , reward = 0.0
Episode 23: steps = 17 , reward = 0.0
Episode 24: steps = 2 , reward = 0.0
Episode 25: steps = 4 , reward = 0.0
Episode 26: steps = 23 , reward = 0.0
Episode 27: steps = 11 , reward = 0.0
Episode 28: steps = 7 , reward = 0.0
Episode 29: steps = 5 , reward = 0.0
Episode 30: steps = 4 , reward = 0.0
Episode 31: steps = 9 , reward = 0.0
Episode 32: steps = 5 , reward = 0.0
Episode 33: steps = 2 , reward = 0.0
Episode 34: steps = 15 , reward = 0.0
Episode 35: steps = 9 , reward = 0.0
Episode 36: steps = 14 , reward = 0.0
Episode 37: steps = 5 , reward = 0.0
Episode 38: steps = 8 , reward = 0.0
Episode 39: steps = 22 , reward = 0.0
Episode 40: steps = 4 , reward = 0.0
Episode 41: steps = 11 , reward = 0.0
Episode 42: steps = 16 , reward = 0.0
Episode 43: steps = 6 , reward = 0.0
Episode 44: steps = 5 , reward = 0.0
Episode 45: steps = 10 , reward = 0.0
Episode 46: steps = 11 , reward = 0.0
Episode 47: steps = 4 , reward = 0.0
Episode 48: steps = 6 , reward = 0.0
Episode 49: steps = 4 , reward = 0.0
Episode 50: steps = 3 , reward = 0.0
Episode 51: steps = 3 , reward = 0.0
Episode 52: steps = 17 , reward = 0.0
Episode 53: steps = 2 , reward = 0.0
Episode 54: steps = 3 , reward = 0.0
Episode 55: steps = 15 , reward = 0.0
Episode 56: steps = 2 , reward = 0.0
Episode 57: steps = 6 , reward = 0.0
Episode 58: steps = 4 , reward = 0.0
Episode 59: steps = 10 , reward = 0.0
Episode 60: steps = 3 , reward = 0.0
Episode 61: steps = 6 , reward = 0.0
Episode 62: steps = 9 , reward = 0.0
Episode 63: steps = 6 , reward = 0.0
Episode 64: steps = 15 , reward = 0.0
Episode 65: steps = 7 , reward = 1.0
Episode 66: steps = 8 , reward = 0.0
Episode 67: steps = 9 , reward = 0.0
Episode 68: steps = 2 , reward = 0.0
Episode 69: steps = 19 , reward = 1.0
Episode 70: steps = 18 , reward = 0.0
Episode 71: steps = 2 , reward = 0.0
Episode 72: steps = 10 , reward = 0.0
Episode 73: steps = 10 , reward = 0.0
Episode 74: steps = 10 , reward = 0.0
Episode 75: steps = 5 , reward = 0.0
Episode 76: steps = 10 , reward = 0.0
Episode 77: steps = 17 , reward = 0.0
Episode 78: steps = 4 , reward = 0.0
Episode 79: steps = 5 , reward = 0.0
Episode 80: steps = 3 , reward = 0.0
Episode 81: steps = 9 , reward = 0.0
Episode 82: steps = 12 , reward = 0.0
Episode 83: steps = 2 , reward = 0.0
Episode 84: steps = 10 , reward = 0.0
Episode 85: steps = 5 , reward = 0.0
Episode 86: steps = 5 , reward = 0.0
Episode 87: steps = 5 , reward = 0.0
Episode 88: steps = 6 , reward = 0.0
Episode 89: steps = 7 , reward = 0.0
Episode 90: steps = 7 , reward = 0.0
Episode 91: steps = 3 , reward = 0.0
Episode 92: steps = 8 , reward = 0.0
Episode 93: steps = 5 , reward = 0.0
Episode 94: steps = 10 , reward = 0.0
Episode 95: steps = 8 , reward = 0.0
Episode 96: steps = 2 , reward = 0.0
Episode 97: steps = 4 , reward = 0.0
Episode 98: steps = 9 , reward = 0.0
Episode 99: steps = 5 , reward = 0.0
Episode 100: steps = 18 , reward = 0.0
Episode 101: steps = 11 , reward = 0.0
Episode 102: steps = 3 , reward = 0.0
Episode 103: steps = 8 , reward = 0.0
Episode 104: steps = 6 , reward = 0.0
Episode 105: steps = 21 , reward = 1.0
Episode 106: steps = 8 , reward = 1.0
Episode 107: steps = 2 , reward = 0.0
Episode 108: steps = 3 , reward = 0.0
Episode 109: steps = 3 , reward = 0.0
Episode 110: steps = 4 , reward = 0.0
Episode 111: steps = 8 , reward = 0.0
Episode 112: steps = 2 , reward = 0.0
Episode 113: steps = 8 , reward = 0.0
Episode 114: steps = 9 , reward = 0.0
Episode 115: steps = 6 , reward = 0.0
Episode 116: steps = 7 , reward = 1.0
Episode 117: steps = 6 , reward = 0.0
Episode 118: steps = 6 , reward = 0.0
Episode 119: steps = 12 , reward = 1.0
Episode 120: steps = 8 , reward = 1.0
Episode 121: steps = 9 , reward = 1.0
Episode 122: steps = 12 , reward = 1.0
Episode 123: steps = 6 , reward = 1.0
Episode 124: steps = 3 , reward = 0.0
Episode 125: steps = 6 , reward = 1.0
Episode 126: steps = 6 , reward = 1.0
Episode 127: steps = 6 , reward = 1.0
Episode 128: steps = 8 , reward = 1.0
Episode 129: steps = 6 , reward = 1.0
Episode 130: steps = 6 , reward = 1.0
Episode 131: steps = 7 , reward = 1.0
Episode 132: steps = 8 , reward = 1.0
Episode 133: steps = 6 , reward = 1.0
Episode 134: steps = 7 , reward = 1.0
Episode 135: steps = 6 , reward = 1.0
Episode 136: steps = 6 , reward = 1.0
Episode 137: steps = 7 , reward = 1.0
Episode 138: steps = 6 , reward = 1.0
Episode 139: steps = 8 , reward = 1.0
Episode 140: steps = 6 , reward = 1.0
Episode 141: steps = 6 , reward = 1.0
Episode 142: steps = 6 , reward = 1.0
Episode 143: steps = 6 , reward = 1.0
Episode 144: steps = 2 , reward = 0.0
Episode 145: steps = 6 , reward = 1.0
Episode 146: steps = 6 , reward = 1.0
Episode 147: steps = 6 , reward = 1.0
Episode 148: steps = 8 , reward = 1.0
Episode 149: steps = 10 , reward = 1.0
Episode 150: steps = 6 , reward = 1.0
Episode 151: steps = 6 , reward = 1.0
Episode 152: steps = 3 , reward = 0.0
Episode 153: steps = 2 , reward = 0.0
Episode 154: steps = 8 , reward = 1.0
Episode 155: steps = 6 , reward = 1.0
Episode 156: steps = 6 , reward = 1.0
Episode 157: steps = 6 , reward = 1.0
Episode 158: steps = 6 , reward = 1.0
Episode 159: steps = 6 , reward = 1.0
Episode 160: steps = 6 , reward = 1.0
Episode 161: steps = 7 , reward = 1.0
Episode 162: steps = 8 , reward = 1.0
Episode 163: steps = 5 , reward = 0.0
Episode 164: steps = 6 , reward = 1.0
Episode 165: steps = 6 , reward = 1.0
Episode 166: steps = 6 , reward = 1.0
Episode 167: steps = 6 , reward = 1.0
Episode 168: steps = 6 , reward = 1.0
Episode 169: steps = 6 , reward = 1.0
Episode 170: steps = 3 , reward = 0.0
Episode 171: steps = 6 , reward = 1.0
Episode 172: steps = 6 , reward = 1.0
Episode 173: steps = 6 , reward = 1.0
Episode 174: steps = 6 , reward = 1.0
Episode 175: steps = 8 , reward = 1.0
Episode 176: steps = 9 , reward = 1.0
Episode 177: steps = 6 , reward = 1.0
Episode 178: steps = 4 , reward = 0.0
Episode 179: steps = 6 , reward = 1.0
Episode 180: steps = 6 , reward = 1.0
Episode 181: steps = 8 , reward = 1.0
Episode 182: steps = 6 , reward = 1.0
Episode 183: steps = 6 , reward = 1.0
Episode 184: steps = 6 , reward = 1.0
Episode 185: steps = 6 , reward = 1.0
Episode 186: steps = 6 , reward = 1.0
Episode 187: steps = 8 , reward = 1.0
Episode 188: steps = 7 , reward = 1.0
Episode 189: steps = 6 , reward = 1.0
Episode 190: steps = 8 , reward = 1.0
Episode 191: steps = 6 , reward = 1.0
Episode 192: steps = 4 , reward = 0.0
Episode 193: steps = 6 , reward = 1.0
Episode 194: steps = 6 , reward = 1.0
Episode 195: steps = 9 , reward = 1.0
Episode 196: steps = 6 , reward = 1.0
Episode 197: steps = 6 , reward = 1.0
Episode 198: steps = 7 , reward = 1.0
Episode 199: steps = 6 , reward = 1.0
Episode 200: steps = 7 , reward = 1.0
Episode 201: steps = 6 , reward = 1.0
Episode 202: steps = 6 , reward = 1.0
Episode 203: steps = 7 , reward = 1.0
Episode 204: steps = 6 , reward = 1.0
Episode 205: steps = 8 , reward = 1.0
Episode 206: steps = 3 , reward = 0.0
Episode 207: steps = 8 , reward = 1.0
Episode 208: steps = 7 , reward = 1.0
Episode 209: steps = 6 , reward = 1.0
Episode 210: steps = 6 , reward = 1.0
Episode 211: steps = 6 , reward = 1.0
Episode 212: steps = 6 , reward = 1.0
Episode 213: steps = 6 , reward = 1.0
Episode 214: steps = 6 , reward = 1.0
Episode 215: steps = 7 , reward = 1.0
Episode 216: steps = 4 , reward = 0.0
Episode 217: steps = 6 , reward = 1.0
Episode 218: steps = 6 , reward = 1.0
Episode 219: steps = 6 , reward = 1.0
Episode 220: steps = 6 , reward = 1.0
Episode 221: steps = 6 , reward = 1.0
Episode 222: steps = 12 , reward = 1.0
Episode 223: steps = 8 , reward = 1.0
Episode 224: steps = 6 , reward = 1.0
Episode 225: steps = 8 , reward = 1.0
Episode 226: steps = 6 , reward = 1.0
Episode 227: steps = 6 , reward = 1.0
Episode 228: steps = 6 , reward = 1.0
Episode 229: steps = 6 , reward = 1.0
Episode 230: steps = 2 , reward = 0.0
Episode 231: steps = 6 , reward = 1.0
Episode 232: steps = 8 , reward = 1.0
Episode 233: steps = 6 , reward = 1.0
Episode 234: steps = 6 , reward = 1.0
Episode 235: steps = 6 , reward = 1.0
Episode 236: steps = 6 , reward = 1.0
Episode 237: steps = 6 , reward = 1.0
Episode 238: steps = 6 , reward = 1.0
Episode 239: steps = 6 , reward = 1.0
Episode 240: steps = 7 , reward = 1.0
Episode 241: steps = 6 , reward = 1.0
Episode 242: steps = 2 , reward = 0.0
Episode 243: steps = 6 , reward = 1.0
Episode 244: steps = 6 , reward = 1.0
Episode 245: steps = 7 , reward = 1.0
Episode 246: steps = 7 , reward = 1.0
Episode 247: steps = 8 , reward = 0.0
Episode 248: steps = 6 , reward = 1.0
Episode 249: steps = 5 , reward = 0.0
Episode 250: steps = 7 , reward = 1.0
Episode 251: steps = 6 , reward = 1.0
Episode 252: steps = 8 , reward = 1.0
Episode 253: steps = 6 , reward = 1.0
Episode 254: steps = 4 , reward = 0.0
Episode 255: steps = 4 , reward = 0.0
Episode 256: steps = 7 , reward = 1.0
Episode 257: steps = 6 , reward = 1.0
Episode 258: steps = 8 , reward = 1.0
Episode 259: steps = 6 , reward = 1.0
Episode 260: steps = 6 , reward = 1.0
Episode 261: steps = 6 , reward = 1.0
Episode 262: steps = 8 , reward = 1.0
Episode 263: steps = 7 , reward = 1.0
Episode 264: steps = 6 , reward = 1.0
Episode 265: steps = 6 , reward = 1.0
Episode 266: steps = 6 , reward = 1.0
Episode 267: steps = 6 , reward = 1.0
Episode 268: steps = 8 , reward = 1.0
Episode 269: steps = 6 , reward = 0.0
Episode 270: steps = 6 , reward = 1.0
Episode 271: steps = 7 , reward = 1.0
Episode 272: steps = 4 , reward = 0.0
Episode 273: steps = 6 , reward = 1.0
Episode 274: steps = 2 , reward = 0.0
Episode 275: steps = 8 , reward = 1.0
Episode 276: steps = 6 , reward = 1.0
Episode 277: steps = 6 , reward = 1.0
Episode 278: steps = 5 , reward = 0.0
Episode 279: steps = 6 , reward = 1.0
Episode 280: steps = 6 , reward = 1.0
Episode 281: steps = 6 , reward = 1.0
Episode 282: steps = 7 , reward = 1.0
Episode 283: steps = 6 , reward = 1.0
Episode 284: steps = 6 , reward = 1.0
Episode 285: steps = 6 , reward = 1.0
Episode 286: steps = 6 , reward = 1.0
Episode 287: steps = 7 , reward = 1.0
Episode 288: steps = 6 , reward = 1.0
Episode 289: steps = 6 , reward = 1.0
Episode 290: steps = 6 , reward = 1.0
Episode 291: steps = 6 , reward = 1.0
Episode 292: steps = 6 , reward = 1.0
Episode 293: steps = 6 , reward = 1.0
Episode 294: steps = 6 , reward = 1.0
Episode 295: steps = 7 , reward = 1.0
Episode 296: steps = 6 , reward = 1.0
Episode 297: steps = 8 , reward = 1.0
Episode 298: steps = 6 , reward = 1.0
Episode 299: steps = 6 , reward = 1.0
Episode 300: steps = 2 , reward = 0.0
Episode 301: steps = 6 , reward = 1.0
Episode 302: steps = 6 , reward = 1.0
Episode 303: steps = 6 , reward = 1.0
Episode 304: steps = 3 , reward = 0.0
Episode 305: steps = 7 , reward = 1.0
Episode 306: steps = 6 , reward = 1.0
Episode 307: steps = 6 , reward = 1.0
Episode 308: steps = 10 , reward = 1.0
Episode 309: steps = 7 , reward = 1.0
Episode 310: steps = 6 , reward = 1.0
Episode 311: steps = 10 , reward = 1.0
Episode 312: steps = 6 , reward = 1.0
Episode 313: steps = 6 , reward = 1.0
Episode 314: steps = 6 , reward = 1.0
Episode 315: steps = 8 , reward = 1.0
Episode 316: steps = 6 , reward = 1.0
Episode 317: steps = 6 , reward = 1.0
Episode 318: steps = 6 , reward = 1.0
Episode 319: steps = 7 , reward = 1.0
Episode 320: steps = 6 , reward = 1.0
Episode 321: steps = 6 , reward = 1.0
Episode 322: steps = 6 , reward = 1.0
Episode 323: steps = 8 , reward = 1.0
Episode 324: steps = 7 , reward = 1.0
Episode 325: steps = 7 , reward = 1.0
Episode 326: steps = 6 , reward = 1.0
Episode 327: steps = 6 , reward = 1.0
Episode 328: steps = 6 , reward = 1.0
Episode 329: steps = 6 , reward = 1.0
Episode 330: steps = 8 , reward = 1.0
Episode 331: steps = 6 , reward = 1.0
Episode 332: steps = 8 , reward = 1.0
Episode 333: steps = 6 , reward = 1.0
Episode 334: steps = 8 , reward = 1.0
Episode 335: steps = 6 , reward = 1.0
Episode 336: steps = 6 , reward = 1.0
Episode 337: steps = 6 , reward = 1.0
Episode 338: steps = 11 , reward = 1.0
Episode 339: steps = 2 , reward = 0.0
Episode 340: steps = 6 , reward = 1.0
Episode 341: steps = 10 , reward = 0.0
Episode 342: steps = 6 , reward = 1.0
Episode 343: steps = 6 , reward = 1.0
Episode 344: steps = 6 , reward = 1.0
Episode 345: steps = 8 , reward = 1.0
Episode 346: steps = 9 , reward = 1.0
Episode 347: steps = 6 , reward = 1.0
Episode 348: steps = 6 , reward = 1.0
Episode 349: steps = 6 , reward = 1.0
Episode 350: steps = 6 , reward = 1.0
Episode 351: steps = 6 , reward = 1.0
Episode 352: steps = 6 , reward = 1.0
Episode 353: steps = 8 , reward = 1.0
Episode 354: steps = 6 , reward = 1.0
Episode 355: steps = 8 , reward = 1.0
Episode 356: steps = 6 , reward = 1.0
Episode 357: steps = 6 , reward = 1.0
Episode 358: steps = 6 , reward = 1.0
Episode 359: steps = 6 , reward = 1.0
Episode 360: steps = 6 , reward = 1.0
Episode 361: steps = 6 , reward = 1.0
Episode 362: steps = 6 , reward = 1.0
Episode 363: steps = 6 , reward = 1.0
Episode 364: steps = 6 , reward = 1.0
Episode 365: steps = 6 , reward = 1.0
Episode 366: steps = 6 , reward = 1.0
Episode 367: steps = 6 , reward = 1.0
Episode 368: steps = 6 , reward = 1.0
Episode 369: steps = 6 , reward = 1.0
Episode 370: steps = 6 , reward = 1.0
Episode 371: steps = 6 , reward = 1.0
Episode 372: steps = 6 , reward = 1.0
Episode 373: steps = 8 , reward = 1.0
Episode 374: steps = 6 , reward = 1.0
Episode 375: steps = 7 , reward = 1.0
Episode 376: steps = 10 , reward = 1.0
Episode 377: steps = 6 , reward = 1.0
Episode 378: steps = 6 , reward = 1.0
Episode 379: steps = 6 , reward = 1.0
Episode 380: steps = 6 , reward = 1.0
Episode 381: steps = 8 , reward = 1.0
Episode 382: steps = 8 , reward = 1.0
Episode 383: steps = 8 , reward = 1.0
Episode 384: steps = 6 , reward = 1.0
Episode 385: steps = 7 , reward = 0.0
Episode 386: steps = 4 , reward = 0.0
Episode 387: steps = 5 , reward = 0.0
Episode 388: steps = 5 , reward = 0.0
Episode 389: steps = 10 , reward = 1.0
Episode 390: steps = 6 , reward = 1.0
Episode 391: steps = 6 , reward = 1.0
Episode 392: steps = 6 , reward = 1.0
Episode 393: steps = 6 , reward = 1.0
Episode 394: steps = 7 , reward = 1.0
Episode 395: steps = 6 , reward = 1.0
Episode 396: steps = 6 , reward = 1.0
Episode 397: steps = 6 , reward = 1.0
Episode 398: steps = 6 , reward = 1.0
Episode 399: steps = 6 , reward = 1.0
Episode 400: steps = 6 , reward = 1.0
Episode 401: steps = 8 , reward = 1.0
Episode 402: steps = 6 , reward = 1.0
Episode 403: steps = 8 , reward = 1.0
Episode 404: steps = 2 , reward = 0.0
Episode 405: steps = 6 , reward = 1.0
Episode 406: steps = 6 , reward = 1.0
Episode 407: steps = 6 , reward = 1.0
Episode 408: steps = 6 , reward = 1.0
Episode 409: steps = 6 , reward = 1.0
Episode 410: steps = 6 , reward = 1.0
Episode 411: steps = 5 , reward = 0.0
Episode 412: steps = 3 , reward = 0.0
Episode 413: steps = 8 , reward = 1.0
Episode 414: steps = 6 , reward = 0.0
Episode 415: steps = 6 , reward = 1.0
Episode 416: steps = 6 , reward = 1.0
Episode 417: steps = 7 , reward = 1.0
Episode 418: steps = 6 , reward = 1.0
Episode 419: steps = 6 , reward = 1.0
Episode 420: steps = 6 , reward = 1.0
Episode 421: steps = 6 , reward = 1.0
Episode 422: steps = 8 , reward = 1.0
Episode 423: steps = 6 , reward = 1.0
Episode 424: steps = 7 , reward = 1.0
Episode 425: steps = 6 , reward = 1.0
Episode 426: steps = 6 , reward = 1.0
Episode 427: steps = 6 , reward = 1.0
Episode 428: steps = 10 , reward = 1.0
Episode 429: steps = 6 , reward = 1.0
Episode 430: steps = 8 , reward = 1.0
Episode 431: steps = 6 , reward = 1.0
Episode 432: steps = 6 , reward = 1.0
Episode 433: steps = 6 , reward = 1.0
Episode 434: steps = 6 , reward = 1.0
Episode 435: steps = 4 , reward = 0.0
Episode 436: steps = 6 , reward = 1.0
Episode 437: steps = 6 , reward = 1.0
Episode 438: steps = 6 , reward = 1.0
Episode 439: steps = 7 , reward = 1.0
Episode 440: steps = 5 , reward = 0.0
Episode 441: steps = 5 , reward = 0.0
Episode 442: steps = 6 , reward = 1.0
Episode 443: steps = 6 , reward = 1.0
Episode 444: steps = 6 , reward = 1.0
Episode 445: steps = 8 , reward = 1.0
Episode 446: steps = 8 , reward = 1.0
Episode 447: steps = 6 , reward = 1.0
Episode 448: steps = 6 , reward = 1.0
Episode 449: steps = 3 , reward = 0.0
Episode 450: steps = 6 , reward = 1.0
Episode 451: steps = 8 , reward = 1.0
Episode 452: steps = 10 , reward = 1.0
Episode 453: steps = 8 , reward = 1.0
Episode 454: steps = 6 , reward = 1.0
Episode 455: steps = 6 , reward = 1.0
Episode 456: steps = 6 , reward = 1.0
Episode 457: steps = 6 , reward = 1.0
Episode 458: steps = 6 , reward = 1.0
Episode 459: steps = 8 , reward = 1.0
Episode 460: steps = 6 , reward = 1.0
Episode 461: steps = 6 , reward = 1.0
Episode 462: steps = 6 , reward = 1.0
Episode 463: steps = 6 , reward = 1.0
Episode 464: steps = 6 , reward = 1.0
Episode 465: steps = 9 , reward = 1.0
Episode 466: steps = 9 , reward = 1.0
Episode 467: steps = 6 , reward = 1.0
Episode 468: steps = 6 , reward = 1.0
Episode 469: steps = 6 , reward = 1.0
Episode 470: steps = 6 , reward = 1.0
Episode 471: steps = 6 , reward = 1.0
Episode 472: steps = 9 , reward = 1.0
Episode 473: steps = 7 , reward = 1.0
Episode 474: steps = 6 , reward = 1.0
Episode 475: steps = 7 , reward = 1.0
Episode 476: steps = 7 , reward = 1.0
Episode 477: steps = 6 , reward = 1.0
Episode 478: steps = 6 , reward = 1.0
Episode 479: steps = 6 , reward = 1.0
Episode 480: steps = 6 , reward = 1.0
Episode 481: steps = 6 , reward = 1.0
Episode 482: steps = 5 , reward = 0.0
Episode 483: steps = 6 , reward = 1.0
Episode 484: steps = 11 , reward = 1.0
Episode 485: steps = 6 , reward = 1.0
Episode 486: steps = 2 , reward = 0.0
Episode 487: steps = 6 , reward = 1.0
Episode 488: steps = 6 , reward = 1.0
Episode 489: steps = 6 , reward = 1.0
Episode 490: steps = 6 , reward = 1.0
Episode 491: steps = 6 , reward = 1.0
Episode 492: steps = 6 , reward = 1.0
Episode 493: steps = 7 , reward = 1.0
Episode 494: steps = 6 , reward = 1.0
Episode 495: steps = 6 , reward = 1.0
Episode 496: steps = 6 , reward = 1.0
Episode 497: steps = 8 , reward = 1.0
Episode 498: steps = 6 , reward = 1.0
Episode 499: steps = 6 , reward = 1.0
test reward = 1.0

运行代码请点击:https://aistudio.baidu.com/aistudio/projectdetail/625951?shared=1

欢迎三连!

标签:Episode,0.0,PARL,Sarsa,steps,learning,1.0,reward,obs
来源: https://blog.csdn.net/chenqianhe2/article/details/115010152

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有