當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

用TensorFlow基于神经网络实现井字棋（含代码）

發(fā)布時(shí)間：2025/3/21 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了用TensorFlow基于神经网络实现井字棋（含代码）小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

為了展示如何應(yīng)用神經(jīng)網(wǎng)絡(luò)算法模型，我們將使用神經(jīng)網(wǎng)絡(luò)來(lái)學(xué)習(xí)優(yōu)化井字棋（Tic Tac Toe）。明確井字棋是一種決策性游戲，并且走棋步驟優(yōu)化是確定的。

開(kāi)始

為了訓(xùn)練神經(jīng)網(wǎng)絡(luò)模型，我們有一系列優(yōu)化的不同的走棋棋譜，棋譜基于棋盤位置列表和對(duì)應(yīng)的最佳落子點(diǎn)。考慮到棋盤的對(duì)稱性，通過(guò)只關(guān)心不對(duì)稱的棋盤位置來(lái)簡(jiǎn)化棋盤。井字棋的非單位變換（考慮幾何變換）可以通過(guò)90度、180度、270度、Y軸對(duì)稱和X軸對(duì)稱旋轉(zhuǎn)獲得。如果這個(gè)假設(shè)成立，我們使用一系列的棋盤位置列表和對(duì)應(yīng)的最佳落子點(diǎn)，應(yīng)用兩個(gè)隨機(jī)變換，然后賦值給神經(jīng)網(wǎng)絡(luò)算法模型學(xué)習(xí)。

井字棋是一種決策類游戲，注意，先下者要么贏，要么繼續(xù)走棋。我們希望能訓(xùn)練一個(gè)算法模型給出最佳走棋，使得棋局繼續(xù)。?

在本例中，棋盤走棋一方“×”用“1”表示，對(duì)手“O”用“-1”表示，空格棋用“0”表示。下圖展示了棋盤的表示方式和走棋：

展示棋盤和走棋的表示方式

注意，× = 1，O = -1，空格棋為 0。棋盤位置索引的起始位置標(biāo)為 0。

除了計(jì)算模型損失之外，我們將用兩種方法來(lái)檢測(cè)算法模型的性能：第一種檢測(cè)方法是，從訓(xùn)練集中移除一個(gè)位置，然后優(yōu)化走棋。這能看出神經(jīng)網(wǎng)絡(luò)算法模型能否生成以前未有過(guò)的走棋（即該走棋不在訓(xùn)練集中）；第二種評(píng)估的方法是，直接實(shí)戰(zhàn)井字棋游戲看是否能贏。?

不同的棋盤位置列表和對(duì)應(yīng)的最佳落子點(diǎn)數(shù)據(jù)在 GitHub [1] 中可以查看。

動(dòng)手做

1.導(dǎo)入必要的編程庫(kù)，代碼如下：

import tensorflow as tfimport matplotlib.pyplot as pltimport csvimport randomimport numpy as npimport random

2.聲明訓(xùn)練模型的批量大小，代碼如下：

batch_size = 50

3.為了讓棋盤看起來(lái)更清楚，我們創(chuàng)建一個(gè)井字棋的打印函數(shù)，代碼如下：

def print_board(board):symbols = ['O',' ','X']board_plus1 = [int(x) + 1 for x in board]print(' ' + symbols[board_plus1[0]] + ' | ' + symbols[board_plus1[1]] + ' | ' + symbols[board_plus1[2]])print('___________')print(' ' + symbols[board_plus1[3]] + ' | ' + symbols[board_plus1[4]] + ' | ' + symbols[board_plus1[5]])print('___________')print(' ' + symbols[board_plus1[6]] + ' | ' + symbols[board_plus1[7]] + ' | ' + symbols[board_plus1[8]])

4.創(chuàng)建get_symmetry()函數(shù)，返回變換之后的新棋盤和最佳落子點(diǎn)，代碼如下：

def get_symmetry(board, response, transformation):''':param board: list of integers 9 long:opposing mark = -1friendly mark = 1empty space = 0:param transformation: one of five transformations on a board:rotate180, rotate90, rotate270, flip_v, flip_h:return: tuple: (new_board, new_response)'''if transformation == 'rotate180':new_response = 8 - responsereturn(board[::-1], new_response)elif transformation == 'rotate90':new_response = [6, 3, 0, 7, 4, 1, 8, 5, 2].index(response)tuple_board = list(zip(*[board[6:9], board[3:6], board[0:3]]))return([value for item in tuple_board for value in item], new_response)elif transformation == 'rotate270':new_response = [2, 5, 8, 1, 4, 7, 0, 3, 6].index(response)tuple_board = list(zip(*[board[0:3], board[3:6],board[6:9]]))[::-1]return([value for item in tuple_board for value in item], new_response)elif transformation == 'flip_v':new_response = [6, 7, 8, 3, 4, 5, 0, 1, 2].index(response)return(board[6:9] +? board[3:6] + board[0:3], new_response)elif transformation == 'flip_h':# flip_h = rotate180, then flip_vnew_response = [2, 1, 0, 5, 4, 3, 8, 7, 6].index(response)new_board = board[::-1]return(new_board[6:9] +? new_board[3:6] + new_board[0:3],new_response)else:raise ValueError('Method not implmented.')

5.棋盤位置列表和對(duì)應(yīng)的最佳落子點(diǎn)數(shù)據(jù)位于.csv文件中。我們將創(chuàng)建get_moves_from_csv()函數(shù)來(lái)加載文件中的棋盤和最佳落子點(diǎn)數(shù)據(jù)，并保存成元組，代碼如下：

def get_moves_from_csv(csv_file):''':param csv_file: csv file location containing the boards w/responses:return: moves: list of moves with index of best response'''moves = []with open(csv_file, 'rt') as csvfile:reader = csv.reader(csvfile, delimiter=',')for row in reader:moves.append(([int(x) for x in row[0:9]],int(row[9])))return(moves)

6.創(chuàng)建一個(gè)get_rand_move()函數(shù)，返回一個(gè)隨機(jī)變換棋盤和落子點(diǎn)，代碼如下：

def get_rand_move(moves, rand_transforms=2):# This function performs random transformations on a board.(board, response) = random.choice(moves)possible_transforms = ['rotate90', 'rotate180', 'rotate270', 'flip_v', 'flip_h']for i in range(rand_transforms):random_transform = random.choice(possible_transforms)(board, response) = get_symmetry(board, response, random_transform)return(board, response)

7.初始化計(jì)算圖會(huì)話，加載數(shù)據(jù)文件，創(chuàng)建訓(xùn)練集，代碼如下：

sess = tf.Session()moves = get_moves_from_csv('base_tic_tac_toe_moves.csv')# Create a train set:train_length = 500train_set = []for t in range(train_length):train_set.append(get_rand_move(moves))

8.前面提到，我們將從訓(xùn)練集中移除一個(gè)棋盤位置和對(duì)應(yīng)的最佳落子點(diǎn)，來(lái)看訓(xùn)練的模型是否可以生成最佳走棋。下面棋盤的最佳落子點(diǎn)是棋盤位置索引為6的位置，代碼如下：

test_board = [-1, 0, 0, 1, -1, -1, 0, 0, 1]train_set = [x for x in train_set if x[0] != test_board]

9.創(chuàng)建init_weights()函數(shù)和model()函數(shù)，分別實(shí)現(xiàn)初始化模型變量和模型操作。注意，模型中并沒(méi)有包含softmax()激勵(lì)函數(shù)，因?yàn)閟oftmax()激勵(lì)函數(shù)會(huì)在損失函數(shù)中出現(xiàn)，代碼如下：

def init_weights(shape):return(tf.Variable(tf.random_normal(shape)))def model(X, A1, A2, bias1, bias2):layer1 = tf.nn.sigmoid(tf.add(tf.matmul(X, A1), bias1))layer2 = tf.add(tf.matmul(layer1, A2), bias2)return(layer2)

10.聲明占位符、變量和模型，代碼如下：

X = tf.placeholder(dtype=tf.float32, shape=[None, 9])Y = tf.placeholder(dtype=tf.int32, shape=[None])A1 = init_weights([9, 81])bias1 = init_weights([81])A2 = init_weights([81, 9])bias2 = init_weights([9])model_output = model(X, A1, A2, bias1, bias2)

11.聲明算法模型的損失函數(shù)，該函數(shù)是最后輸出的邏輯變換的平均softmax值。然后聲明訓(xùn)練步長(zhǎng)和優(yōu)化器。為了將來(lái)可以和訓(xùn)練好的模型對(duì)局，我們也需要?jiǎng)?chuàng)建預(yù)測(cè)操作，代碼如下：

loss = tf.reduce_mean( tf.nn.sparse_softmax_cross_entropy_with_logits(model_output, Y))train_step = tf.train.GradientDescentOptimizer(0.025).minimize(loss)prediction = tf.argmax(model_output, 1)

12.初始化變量，遍歷迭代訓(xùn)練神經(jīng)網(wǎng)絡(luò)模型，代碼如下：

# Initialize variablesinit = tf.initialize_all_variables()sess.run(init)loss_vec = []for i in range(10000):# Select random indices for batchrand_indices = np.random.choice(range(len(train_set)), batch_size, replace=False)# Get batchbatch_data = [train_set[i] for i in rand_indices]x_input = [x[0] for x in batch_data]y_target = np.array([y[1] for y in batch_data])# Run training stepsess.run(train_step, feed_dict={X: x_input, Y: y_target})# Get training losstemp_loss = sess.run(loss, feed_dict={X: x_input, Y: y_target})loss_vec.append(temp_loss)if i%500==0:print('iteration ' + str(i) + ' Loss: ' + str(temp_loss))

13.繪制模型訓(xùn)練的損失函數(shù)，代碼如下（對(duì)應(yīng)的圖見(jiàn)圖6-10）：

plt.plot(loss_vec, 'k-', label='Loss')plt.title('Loss (MSE) per Generation')plt.xlabel('Generation')plt.ylabel('Loss')plt.show() 迭代10000次訓(xùn)練的井字棋模型的損失函數(shù)圖

下面繪制模型訓(xùn)練的損失函數(shù)：?

1.為了測(cè)試模型，將展示如何在測(cè)試棋盤（從訓(xùn)練集中移除的數(shù)據(jù)）使用。我們希望看到模型能生成預(yù)測(cè)落子點(diǎn)的索引，并且索引值為6。在大部分情況下，模型都會(huì)成功預(yù)測(cè)，代碼如下：

test_boards = [test_board]feed_dict = {X: test_boards}logits = sess.run(model_output, feed_dict=feed_dict)predictions = sess.run(prediction, feed_dict=feed_dict)print(predictions)

2.輸出結(jié)果如下：

[6]

3.為了能夠評(píng)估訓(xùn)練模型，我們計(jì)劃和訓(xùn)練好的模型進(jìn)行對(duì)局。為了實(shí)現(xiàn)該功能，我們創(chuàng)建一個(gè)函數(shù)來(lái)檢測(cè)是否贏了棋局，這樣程序才能在該結(jié)束的時(shí)間喊停，代碼如下：

def check(board):wins = [[0,1,2], [3,4,5], [6,7,8], [0,3,6], [1,4,7], [2,5,8],[0,4,8], [2,4,6]]for i in range(len(wins)):if board[wins[i][0]]==board[wins[i][1]]==board[wins[i][2]]==1.:return(1)elif board[wins[i][0]]==board[wins[i][1]]==board[wins[i][2]]==-1.:return(1)return(0)

4.現(xiàn)在遍歷迭代，同訓(xùn)練模型進(jìn)行對(duì)局。起始棋盤為空棋盤，即為全0值；然后詢問(wèn)棋手要在哪個(gè)位置落棋子，即輸入0-8的索引值；接著將其傳入訓(xùn)練模型進(jìn)行預(yù)測(cè)。對(duì)于模型的走棋，我們獲得了多個(gè)可能的預(yù)測(cè)。最后顯示井字棋游戲的樣例。對(duì)于該游戲來(lái)說(shuō)，我們發(fā)現(xiàn)訓(xùn)練的模型表現(xiàn)得并不理想，代碼如下：

game_tracker = [0., 0., 0., 0., 0., 0., 0., 0., 0.]win_logical = Falsenum_moves = 0while not win_logical:player_index = input('Input index of your move (0-8): ')num_moves += 1# Add player move to gamegame_tracker[int(player_index)] = 1.# Get model's move by first getting all the logits for eachindex[potential_moves] = sess.run(model_output, feed_dict={X:[game_tracker]})# Now find allowed moves (where game tracker values = 0.0)allowed_moves = [ix for ix,x in enumerate(game_tracker) ifx==0.0]# Find best move by taking argmax of logits if they are inallowed movesmodel_move = np.argmax([x if ix in allowed_moves else -999.0for ix,x in enumerate(potential_moves)])# Add model move to gamegame_tracker[int(model_move)] = -1.print('Model has moved')print_board(game_tracker)# Now check for win or too many movesif check(game_tracker)==1 or num_moves>=5:print('Game Over!')win_logical = True

5.人機(jī)交互的輸出結(jié)果如下：

Input index of your move (0-8): 4

Model has moved

O |? ?|?

___________

? | X |?

___________?

? |? ?|? ?

Input index of your move (0-8): 6

Model has moved

O |? ?|?

___________

? | X |?

___________?

X |? ?|? O?

Input index of your move (0-8): 2

Model has moved

O |? ?| X?

___________

O | X |?

___________

X |? ?| O?

Game Over!

工作原理

我們訓(xùn)練一個(gè)神經(jīng)網(wǎng)絡(luò)模型來(lái)玩井字棋游戲，該模型需要傳入棋盤位置，其中棋盤的位置是用一個(gè)九維向量來(lái)表示的。然后預(yù)測(cè)最佳落子點(diǎn)。我們需要賦值可能的井字棋棋盤，應(yīng)用隨機(jī)轉(zhuǎn)換來(lái)增加訓(xùn)練集的大小。?

為了測(cè)試算法模型，我們移除一個(gè)棋盤位置列表和對(duì)應(yīng)的最佳落子點(diǎn)，然后看訓(xùn)練模型能否生成預(yù)測(cè)的最佳落棋點(diǎn)。最后，我們也和訓(xùn)練模型進(jìn)行對(duì)局，但是結(jié)果并不理想，我們?nèi)匀恍枰獓L試不同的架構(gòu)和訓(xùn)練方法來(lái)提高效果。

總結(jié)

以上是生活随笔為你收集整理的用TensorFlow基于神经网络实现井字棋（含代码）的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：微信中两大典型微服务案例
下一篇：提升用户体验的必杀器——A/B实验统计方

编程问答

用TensorFlow基于神经网络实现井字棋（含代码）

開(kāi)始

動(dòng)手做

相關(guān)鏈接

總結(jié)