ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

Rock Paper Scissors

2021-05-02 02:03:08  阅读:1547  来源: 互联网

标签:play batch len Paper Rock Scissors model history opponent


Rock Paper Scissors

https://www.freecodecamp.org/learn/machine-learning-with-python/machine-learning-with-python-projects/rock-paper-scissors

简单的竞赛游戏, 使用算法,学习对手的规则,战胜对手。

For this challenge, you will create a program to play Rock, Paper, Scissors. A program that picks at random will usually win 50% of the time. To pass this challenge your program must play matches against four different bots, winning at least 60% of the games in each match.

You can access the full project description and starter code on repl.it.

 

马尔科夫链解法

https://forum.freecodecamp.org/t/cant-beat-abbey-rock-paper-scissors-project/447449/2

https://github.com/marius-mm/freeCodeCampProjects/blob/main/Machine%20Learning%20with%20Python/RockPaperScissors/RPS.py

 

作者生成四个对手中能打败3个,但是实际测试只能打败两个。以60%为标准。

import random
import mchmm as mc
import numpy as np

winDict = {"R": "P", "S": "R", "P": "S"}

strategy = 1


def player(prev_play, opponent_history=[]):
    global strategy
    # firstCall
    if len(opponent_history) <= 0:
        opponent_history.append("R")
        opponent_history.append("S")
    if len(prev_play) <= 0:
        prev_play = "P"
    # /firstCall

    opponent_history.append(prev_play)

    if strategy == 1:
        memory = 800
        guess = predict(prev_play, opponent_history, memory)

    return guess


def predict(prev_play, oppnent_history, memoryLength):
    if len(oppnent_history) > memoryLength:
        oppnent_history.pop(0)

    chain = mc.MarkovChain().from_data(oppnent_history)
    predictionNextItem = giveMostProbableNextItem(chain, prev_play)
    winningMove = winDict[predictionNextItem]
    return winningMove


def contains_duplicates(X):
    X = np.round(X,4)
    return len(np.unique(X)) != len(X)


def giveIndexOfState(chain, item):
    return np.where(chain.states == item)[0][0]


def giveMostProbableNextItem(chain, lastItem):



    retval = chain.states[
        np.argmax(chain.observed_p_matrix[giveIndexOfState(chain, lastItem)])
    ]

    return retval

 

据介绍需要增加马尔科夫链的长度。

do you mean with chain length like different states? RS RR RP SR … instead of S R P ?

That’s exactly what I mean. That’s also what Abbey is doing, so you will have to use a longer chain than she does or use her chain against her to win.

 

https://forum.freecodecamp.org/t/rock-paper-scissors-help-with-abbey/452902

abbey使用的是长度为2的马尔科夫链, 作为对手需要使用不小于二的链去竞赛。

Abbey is a Markov chain player, using a length of 2, so you’ve got a good example there. A longer Markov chain can defeat her or an appropriate length 2 chain will work as well. As I have mentioned here before, it is possible to know who you are playing, through various means, and employ the correct algorithm against them. It’s possible to beat all the players more than 80% of the time. As you can see from reading Abbey’s code, a Markov chain algorithm isn’t very complex.

A Markov chain isn’t the only algorithm that will work here, but it is one of the simpler ones. This project is not current fashionable machine learning (think neural nets) but old school machine learning. There is quite a bit of information on RPS strategy on the web once you get past all the RPS bot tutorials that use neural nets to recognize human hands playing RPS.

 

mchmm 库

https://github.com/maximtrp/mchmm

Discrete Markov chains

Initializing a Markov chain using some data.

>>> import mchmm as mc
>>> a = mc.MarkovChain().from_data('AABCABCBAAAACBCBACBABCABCBACBACBABABCBACBBCBBCBCBCBACBABABCBCBAAACABABCBBCBCBCBCBCBAABCBBCBCBCCCBABCBCBBABCBABCABCCABABCBABC')

Now, we can look at the observed transition frequency matrix:

>>> a.observed_matrix
array([[ 7., 18.,  7.],
       [19.,  5., 29.],
       [ 5., 30.,  3.]])

And the observed transition probability matrix:

>>> a.observed_p_matrix
array([[0.21875   , 0.5625    , 0.21875   ],
       [0.35849057, 0.09433962, 0.54716981],
       [0.13157895, 0.78947368, 0.07894737]])

You can visualize your Markov chain. First, build a directed graph with graph_make() method of MarkovChain object. Then render() it.

>>> graph = a.graph_make(
      format="png",
      graph_attr=[("rankdir", "LR")],
      node_attr=[("fontname", "Roboto bold"), ("fontsize", "20")],
      edge_attr=[("fontname", "Iosevka"), ("fontsize", "12")]
    )
>>> graph.render()

Here is the result:

images/mc.png

 

 

 

 

 

最大可能子序列预测法

https://forum.freecodecamp.org/t/machine-learning-with-python-projects-rock-paper-scissors/412794/4

比赛过程中,统计对手的最近N步,后的出手情况的次数, 根据最大概率, 来推测当前用户出手的可能性。

 

Unfortunately I don’t seem to have saved a copy :frowning:
But basically you use the last n moves to predict what Abby will do next. The important step here ist to basically let your programme dynamically/on the fly build up a list of combinations containing the last n steps + entry n+1 (Abby’s reaction) and their counts. So you start with an empty list, and every time new data rolls in, you check, if you have this entry in the list: If yes increase it’s count by 1, otherwise set it to 1
Say, we have the following example (n=2 for simplicity, to beat Abby you’ll need to increase n): Incoming data: [P,P,R,S,P,P,R…]
Initially list contains
When we have [P,P,R],(length=n+1) we enter ‘PPR’ = 1 (elements 0 to n of incoming data) into our list
Then ‘PRS’ = 1 (elements 1 to n+1 of incoming data)
Then ‘RSP’ = 1 (elements 2 to n+2 of incoming data)
Then ‘SPP’ = 1
Then we see, we already have ‘PPR’ in our list, so we increase it to 2…
Hope this helps, otherwise please feel free to ask! Sorry, I don’t seem to have the code anymore

 

wtf = {}

def player(prev_play, opponent_history=[]):
  global wtf

  n = 5

  if prev_play in ["R","P","S"]:
    opponent_history.append(prev_play)

  guess = "R" # default, until statistic kicks in

  if len(opponent_history)>n:
    inp = "".join(opponent_history[-n:])

    if "".join(opponent_history[-(n+1):]) in wtf.keys():
      wtf["".join(opponent_history[-(n+1):])]+=1
    else:
      wtf["".join(opponent_history[-(n+1):])]=1

    possible =[inp+"R", inp+"P", inp+"S"]

    for i in possible:
      if not i in wtf.keys():
        wtf[i] = 0

    predict = max(possible, key=lambda key: wtf[key])

    if predict[-1] == "P":
      guess = "S"
    if predict[-1] == "R":
      guess = "P"
    if predict[-1] == "S":
      guess = "R"


  return guess

 

RNN边赛边练法

https://github.com/fanqingsong/boilerplate-rock-paper-scissors/blob/master/RPS.py

采用RNN网络, 使用在线学习的技术, 在每次对方出手后, 进行在线学习, 然后根据学习后的模型, 进行预测对手下一步出手的可能性。

 

结果 -- 从中看出abbey还是很难使用RNN网络去对付

-------- you vs quincy -------------

Final results: {'p1': 988, 'p2': 6, 'tie': 6}
Player 1 win rate: 99.3963782696177%
-------- you vs abbey -------------
Final results: {'p1': 431, 'p2': 301, 'tie': 268}
Player 1 win rate: 58.879781420765035%
-------- you vs kris -------------
Final results: {'p1': 768, 'p2': 227, 'tie': 5}
Player 1 win rate: 77.1859296482412%
-------- you vs mrugesh -------------
Final results: {'p1': 828, 'p2': 169, 'tie': 3}
Player 1 win rate: 83.04914744232697%

 

code

# The example function below keeps track of the opponent's history and plays whatever the opponent played two plays ago. It is not a very good player so you will need to change the code to pass the challenge.

import numpy as np
import random
from keras.models import Sequential
from keras.layers import Dense, Input, LSTM
from keras.layers.core import Dense, Activation, Dropout
from keras.utils import np_utils
import keras as K

look_back = 4
win_dict = {"R": "P", "S": "R", "P": "S"}


def create_nn_model():
    init = K.initializers.glorot_uniform(seed=1)
    simple_adam = K.optimizers.Adam()

    model = Sequential()
    # model.add(Input(shape=(look_back,)))
    model.add(LSTM(10, input_shape=(1,look_back)))
    # model.add(Dense(20, activation='relu'))
    model.add(Dense(10, activation='relu'))
    # model.add(Dropout(0.3))
    # model.add(Dense(5, activation='relu'))
    # model.add(Dropout(0.3))
    model.add(Dense(3, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer=simple_adam, metrics=['accuracy'])

    return model


def player(prev_play, opponent_history, model, batch_x, batch_y, review_epochs=10):
    # print(f"now player1 is in turn, opponent play is {prev_play}")

    plays = ["R","P","S"]
    play_dict = {"R":0,"P":1,"S":2}
    plays_categorial = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]

    opponent_history_len = len(opponent_history)
    # print(f"opponent_history_len = {opponent_history_len}")

    if opponent_history_len < look_back:
        if prev_play:
            opponent_history.append(prev_play)
        guess = random.randint(0,2)
        return plays[guess]

    one_x = [play_dict[move] for move in opponent_history[-look_back:]]

    one_y = play_dict[prev_play]
    one_y = plays_categorial[one_y]

    batch_x.append(one_x)
    batch_y.append(one_y)

    for i in range(0, review_epochs):
        # print(f"now train by epoch {i}")
        batch_x = np.array(batch_x)
        # print(batch_x.shape)
        batch_x_final = np.reshape(batch_x, (batch_x.shape[0], 1, batch_x.shape[1]))
        # print(batch_x.shape)

        batch_y = np.array(batch_y)
        # print(batch_y.shape)

        # print(batch_x_final.shape)
        # print(batch_y.shape)
        model.train_on_batch(batch_x_final, batch_y)

    opponent_history.append(prev_play)

    current_x = [play_dict[move] for move in opponent_history[-look_back:]]
    current_x = np.array([current_x])
    current_x = np.reshape(current_x, (current_x.shape[0], 1, current_x.shape[1]))
    predict_y = model.predict_on_batch(current_x)
    predict_y = predict_y.tolist()
    # print(predict_y)
    predict_y = predict_y[0]
    guess = np.argmax(predict_y)
    # print(guess)

    opponent_play = plays[guess]
    me_play = random.choice(['R', 'P', 'S'])
    me_play = win_dict.get(opponent_play, me_play)

    return me_play

 

参考资料

使用keras进行鸢尾花种类预测

https://www.jianshu.com/p/1d88a6ed707e

https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/

 

RNN预测股票收盘价格示例

https://github.com/omerbsezer/LSTM_RNN_Tutorials_with_Demo/blob/master/StockPricesPredictionProject/pricePredictionLSTM.py

# create and fit the LSTM network, optimizer=adam, 25 neurons, dropout 0.1
model = Sequential()
model.add(LSTM(25, input_shape=(1, look_back)))
model.add(Dropout(0.1))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
model.fit(trainX, trainY, epochs=1000, batch_size=240, verbose=1)

# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

 

Tensorflow 2.0 LSTM training model

https://www.programmersought.com/article/57304583087/

# Import library
import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
from tensorflow import keras
import numpy as np
from scipy import sparse

import os

# Only use gpu 0
os.environ["CUDA_VISIBLE_DEVICES"] = "1" 

# Set random number seed
tf.random.set_seed(22)
np.random.seed(22)
assert tf.__version__.startswith('2.')

batchsz = 256 # batch size

# the most frequest words
total_words = 4096 # Number of words in the dictionary to be encoded
max_review_len = 1995 # How many words does the sequence contain
embedding_len = 100 # Length of each word encoding

units = 64 # The dimension of the parameter output in the lstm layer
epochs = 100  #Train 100 epches

# Read in the data, here is the data stored with sparse matrix
matrixfile = "textword_numc_sparse.npz" # Import your own text sample, the text has been converted into digital representation
targetfile = "target_5k6mer_tfidf.txt" # label, this is the second category

allmatrix = sparse.load_npz(matrixfile).toarray() 
target = np.loadtxt(targetfile)
print("allmatrix shape: {};target shape: {}".format(allmatrix.shape, target.shape))

x = tf.convert_to_tensor(allmatrix, dtype=tf.int32)
x = keras.preprocessing.sequence.pad_sequences(x, maxlen=max_review_len)
y = tf.convert_to_tensor(target, dtype=tf.int32)

idx = tf.range(allmatrix.shape[0])
idx = tf.random.shuffle(idx)

# Divide the training set, verification set, test set, according to the ratio of 7:1:2
x_train, y_train = tf.gather(x, idx[:int(0.7 * len(idx))]), tf.gather(y, idx[:int(0.7 * len(idx))])
x_val, y_val = tf.gather(x, idx[int(0.7 * len(idx)):int(0.8 * len(idx))]), tf.gather(y, idx[int(0.7 * len(idx)):int(0.8 * len(idx))])
x_test, y_test = tf.gather(x, idx[int(0.8 * len(idx)):]), tf.gather(y, idx[int(0.8 * len(idx)):])
print(x_train.shape,x_val.shape,x_test.shape)

db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
db_train = db_train.shuffle(6000).batch(batchsz, drop_remainder=True).repeat()
db_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
db_val = db_val.batch(batchsz, drop_remainder=True)
db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
db_test = db_test.batch(batchsz, drop_remainder=True)

# Build a model
network = Sequential([layers.Embedding(total_words, embedding_len,input_length=max_review_len),
                      layers.LSTM(units, dropout=0.5, return_sequences=True, unroll=True),
                      layers.LSTM(units, dropout=0.5, unroll=True),
                      # If using gru, just replace the upper two layers with
                      # layers.GRU(units, dropout=0.5, return_sequences=True, unroll=True),
                      # layers.GRU(units, dropout=0.5, unroll=True),
                      layers.Flatten(),
                      #layers.Dense(128, activation=tf.nn.relu),
                      #layers.Dropout(0.6),
                      layers.Dense(1, activation='sigmoid')])

# View model sumaary
network.build(input_shape=(None, max_review_len))
network.summary()

# Compile
network.compile(optimizer=keras.optimizers.Adam(0.001),
                  loss=tf.losses.BinaryCrossentropy(),
                  metrics=['accuracy'])

#Training, note that setps_per_epoches is set here, repeat() is required in db_train, otherwise there is warning, see my article for details: https://blog.csdn.net/weixin_44022515/article/details/103884654

network.fit(db_train, epochs=epochs, validation_data=db_val,steps_per_epoch=x_train.shape[0]//batchsz)

network.evaluate(db_test)

 

keras 在线学习接口 train_on_batch

https://www.programmersought.com/article/2809219970/

for batch_no in range(100):
    X_train, Y_train = np.random.rand(32, 3), np.random.rand(32, 1)
    logs = model.train_on_batch(X_train, Y_train)

https://keras.io/api/models/model_training_apis/#trainonbatch-method

Runs a single gradient update on a single batch of data.

 

标签:play,batch,len,Paper,Rock,Scissors,model,history,opponent
来源: https://www.cnblogs.com/lightsong/p/14725190.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有