日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

国科大高级人工智能10-强化学习(多臂赌博机、贝尔曼)

發布時間:2024/7/5 编程问答 41 豆豆
生活随笔 收集整理的這篇文章主要介紹了 国科大高级人工智能10-强化学习(多臂赌博机、贝尔曼) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

  • 多臂賭博機Multi-armed bandit(無狀態)
  • 馬爾科夫決策過程MDP(markov decision process
  • 1.動態規劃
  • 蒙特卡羅方法——不知道環境完整模型情況下
    • 2.1 on-policy蒙特卡羅
    • 2.2 off-policy蒙特卡羅
  • 時序差分方法

  • 強化學習:Reinforcement learning
    • 目標:學習從環境狀態到行為的映射,智能體選擇能夠獲得環境最大獎賞的行為,使得外部環境對學習系統在某種意義下的評價為最佳
    • 區別:
      • 監督學習:標注中學習
      • 強化學習:交互——學習策略
    • 特性——用于判斷某一問題可否用強化學習求解
      • 試錯搜索
      • 延遲獎勵
    • 挑戰
      • exploitation 開采(按原方法進行
      • exploration勘測(看有沒有其他方法,試一試
    • 注重總體目標,階段性不重要
    • 主體:智能體和環境
      • 狀態、行為和獎勵
    • 要素
      • 策略
        • 狀態到行為的映射
          • 確定策略S->A
          • 隨機策略S->A1\A2\A3?
      • 獎勵
        • 關于狀態和行為的函數,有不確定性
      • 價值
        • 累積獎勵
        • 長期目標
      • 環境模型
        • 刻畫反饋
  • 反饋
    • 評價性反饋(強化學習)
      • 對行為評價
    • 指導性反饋(監督學習)
      • 獨立于行為

多臂賭博機Multi-armed bandit(無狀態)

方法確定性?特性
貪心策略At=argmaxaQt(a)(均值)At=argmax_aQ_t(a)(均值)At=argmaxa?Qt?(a)(確定性算法
?\epsilon?貪心策略1??1-\epsilon1??:貪心選擇;?\epsilon?:隨機選擇確定性算法-
樂觀初值法Optimistic initial values每個行為的初值都高Q1高,?=0\epsilon=0?=0確定性算法初始只探索,最終貪心
UCBAT=argmaxa(Qt(a)+clntNt(a)),Nt(a)?a被選擇的次數A_T=argmax_a(Q_t(a)+c\sqrt{\frac{lnt}{N_t(a)}}),N_t(a)-a被選擇的次數AT?=argmaxa?(Qt?(a)+cNt?(a)lnt??),Nt?(a)?a確定性算法最初差,后比貪心好,收斂于貪心
梯度賭博機算法$P(A_t=a)=\frac{e{H_t(a)}}{\Sigma_b=1k e^{H_t(b)}}=\pi_t(a).優化目標 E(R_t)=\Sigma_b\pi_t(b)q(b) $不確定性算法更新Ht
  • 形式化

    • 行為:搖哪個臂
      • At:第t輪的行為
    • 獎勵:每次搖臂獲得的獎勵
      • Rt:獎勵
    • 第t輪采取的行為a的期望:
      • q(a)=E(Rt|At=a)
      • –貪心策略,每次都選期望最大的a,但不知道期望
      • 只能通過經驗,對q(a)估計Qt(a),用貪心策略依據Qt(a)
  • 優化目標:當前行為的期望收益

  • 策略

    • 利用:exploitation
      • 按照貪心策略進行選擇,即選擇𝑄𝑡 𝑎 最大的行為𝑎
      • 優點:最大化即時獎勵
      • 缺點:由于𝑄𝑡 𝑎 只是對𝑞? 𝑎 的估計,估計的不確定性導致按照貪心策略選擇的行為不一定是使𝑞? 𝑎 最大的行為
    • 探索:Exploration
      • 選擇貪心策略之外的行為(non-greedy actions)
      • 缺點:短期獎勵會比較低
      • 優點:長期獎勵會比較高,通過探索可以找出獎勵更大的行為,供后續選擇
    • 每次二選一,如何平衡?
  • 貪心策略

    • At=argmaxaQt(a)A_t=argmax_aQ_t(a)At?=argmaxa?Qt?(a)
    • 有多個最大,則隨即一個
  • ?\epsilon?貪心策略

    • 1??1-\epsilon1??:貪心選擇(exploitation
    • ?\epsilon?:隨機選擇(exporation
    • ?\epsilon?–取決于q(a)的方差,方差越大,取值越大
    • eg
      • 假設q(a)~N(0,1)
      • 則At~N(0,1)正態分布
  • 行為估值方法Qt(a)

    • Qt(a)=采取該行為所獲得的獎勵和采取該行為的次數=Σi=1t?1Ri1Ai=aΣi=1t?11Ai=a=行為a獎勵的均值Q_t(a)=\frac{采取該行為所獲得的獎勵和}{采取該行為的次數}=\frac{\Sigma_{i=1}^{t-1}R_i1_{A_i=a}}{\Sigma_{i=1}^{t-1}1_{A_i=a}}=行為a獎勵的均值Qt?(a)=?=Σi=1t?1?1Ai?=a?Σi=1t?1?Ri?1Ai?=a??=a
    • 約定,分母=0,Qt(a)=0
    • 分母無窮大,Qt(a)–>q(a)
    • 增量實現
      • Qn(a)=R1+R2+...+Rn?1n?1Q_n(a)=\frac{R_1+R_2+...+R_{n-1}}{n-1}Qn?(a)=n?1R1?+R2?+...+Rn?1??
      • Qn+1(a)=R1+R2+...+Rn?1+Rnn=1n(Rn+Σi=1n?1Ri)=1n(Rn+(n?1)Qn(a))=Qn(a)?1n(Rn?Qn(a))Q_{n+1}(a)=\frac{R_1+R_2+...+R_{n-1}+R_{n}}{n}=\frac{1}{n}(R_n+\Sigma_{i=1}^{n-1}R_i)=\frac{1}{n}(R_n+(n-1)Q_n(a))=Q_n(a)-\frac{1}{n}(R_n-Q_n(a))Qn+1?(a)=nR1?+R2?+...+Rn?1?+Rn??=n1?(Rn?+Σi=1n?1?Ri?)=n1?(Rn?+(n?1)Qn?(a))=Qn?(a)?n1?(Rn??Qn?(a))
    • 更新公式newEstimate<??oldEstimate+stepsize(target?oldEstimate)newEstimate<--oldEstimate+stepsize(target-oldEstimate)newEstimate<??oldEstimate+stepsize(target?oldEstimate)
      • 貪心策略的步長:1/n
        • 收斂
      • 更一般的:α或αt(a)\alpha或\alpha_t(a)ααt?(a)——像SGD
    • 非平穩狀態的更新公式
      • Qn+1(a)=Qn(a)?α(Rn?Qn(a))=αRn+(1?α)Qn(a)=αRn+(1?α)αRn?1+(1?α)2Qn?1=...=(1?α)nQ1+Σi=1n(1?α)n?iαRiQ_{n+1}(a)=Q_n(a)-\alpha(R_n-Q_n(a))=\alpha R_n+(1-\alpha)Q_n(a)=\alpha R_n+(1-\alpha)\alpha R_{n-1}+(1-\alpha)^2Q_{n-1}=...=(1-\alpha)^nQ_1+\Sigma_{i=1}^n(1-\alpha)^{n-i}\alpha R_iQn+1?(a)=Qn?(a)?α(Rn??Qn?(a))=αRn?+(1?α)Qn?(a)=αRn?+(1?α)αRn?1?+(1?α)2Qn?1?=...=(1?α)nQ1?+Σi=1n?(1?α)n?iαRi?
      • 這已經是個非平穩的了,時間越近,占比越大—帶權值平均
      • 不收斂
    • 收斂條件
      • Σn=1∞αn(a)=∞\Sigma_{n=1}^{\infty}\alpha_n(a)=\inftyΣn=1?αn?(a)=:步長足夠大,克服初值和隨機擾動的影響
      • $ \Sigma_{n=1}{\infty}\alpha_n2(a)<\infty$:步長最終會越來越小,小到保證收斂
  • 平穩問題

    • q(a)是穩定的,不隨時間改變
    • 隨著觀測樣本的增加,平均值估計方法最終收斂于q(a)
  • 非平穩問題

    • q(a)是關于時間的函數(可能老化了)
    • 關注最近的觀測樣本,時間遠的就不靠譜了

  • N(A)–A被選擇的次數

  • 行為選擇策略

    • 如何制定?
      • 貪心策略:選擇當前估值最好的行為
      • 𝜺貪心策略:以一定的概率隨機選擇非貪心行為(nongreedy actions),但是對于非貪心行為不加區分
    • ? 平衡exploitation和exploration,應對行為估值的不確定性
    • ? 關鍵:確定每個行為被選擇的概率
    • 行為的初始估值
      • 前述貪心策略中,每個行為的初始估值為0
      • 每個行為的初始估值可以幫助我們引入先驗知識
      • 初始估值還可以幫助我們平衡exploitation和exploration
      • 樂觀初值法Optimistic initial values
        • 每個行為都有個高的初值
        • 優點:初期每個行為都有較大的機會被探索,快速探索
        • 早期只探索,不開采,不關心歷史
        • 早期差,但后期很快就跟上
        • 缺點:可能一輩子都探索不完
        • ==Q1=5,𝜺=0的𝜺貪心
    • UCB(Upper-confidence-bound上確界
      • AT=argmaxa(Qt(a)+clntNt(a)),Nt(a)?a被選擇的次數A_T=argmax_a(Q_t(a)+c\sqrt{\frac{lnt}{N_t(a)}}),N_t(a)-a被選擇的次數AT?=argmaxa?(Qt?(a)+cNt?(a)lnt??),Nt?(a)?a
      • 選擇潛力大的:依據估值的置信上界選擇
        • 第一項:當前估值高(接近貪心
        • 第二項:不確定性要求高(被選擇的次數少–潛力大
        • c:控制探索的程度
      • 比較:
        • 最初幾輪差,之后會比𝜺貪心策略好
        • 穩定
        • 參數不好調
        • 最終會收斂到貪婪策略
      • 復雜,在多臂賭博機之外的情況用得少
  • 梯度賭博機算法

    • 不確定性算法(隨機策略
    • Ht(a):在t輪對行為a的偏好程度
      • 依據選擇后的行為,再更新Ht(a)
    • 選擇a的概率P(At=a)=eHt(a)Σb=1keHt(b)=πt(a)P(A_t=a)=\frac{e^{H_t(a)}}{\Sigma_b=1^k e^{H_t(b)}}=\pi_t(a)P(At?=a)=Σb?=1keHt?(b)eHt?(a)?=πt?(a)
    • 更新公式==SGD
      • Ht+1(At)=Ht(At)+α(Rt?Rtˉ)(1?πt(At));Rtˉ=Qt(a)均值H_{t+1}(A_t)=H_t(A_t)+\alpha(R_t-\bar{R_t})(1-\pi_t(A_t));\bar{R_t}=Q_t(a)均值Ht+1?(At?)=Ht?(At?)+α(Rt??Rt?ˉ?)(1?πt?(At?));Rt?ˉ?=Qt?(a)
      • 對所有a!=A_t:Ht+1(a)=Ht(a)?α(Rt?Rtˉ)(πt(a))H_{t+1}(a)=H_t(a)-\alpha(R_t-\bar{R_t})(\pi_t(a))Ht+1?(a)=Ht?(a)?α(Rt??Rt?ˉ?)(πt?(a))
    • 優化目標:第t輪期望獎勵的大小
      • E(Rt)=Σbπt(b)q(b)E(R_t)=\Sigma_b\pi_t(b)q(b)E(Rt?)=Σb?πt?(b)q(b)
  • 多臂賭博機–強化學習的簡化

    • 行為和狀態之間無關
  • 擴展

    • 有上下文的多臂賭博及
      • 行為不改變狀態
  • 更一般的情形

    • 馬爾科夫決策過程

馬爾科夫決策過程MDP(markov decision process

  • 常用于建模序列化決策過程

  • 行為

    • 可獲得獎勵
    • 改變狀態–影響長期獎勵
  • 學習狀態到行為的映射–策略

    • 多臂賭博機q(a)
    • MDP學習𝑞(𝑠,𝑎) 或𝒗(𝑠)
  • 智能體和環境按離散的時間交互

  • 形式化記號

    • St∈SS_t \in SSt?S狀態
    • At∈AA_t \in AAt?A行為(有的地方可以走,有的不可以走,有個取值范圍)
    • 采取At后,轉到狀態St+1,并獲得Rt+1
    • 馬爾科夫決策過程得到的序列記為
      • S0,A0,R1,S1,A1,R2,S2,...S_0,A_0,R_1,S_1,A_1,R_2,S_2,...S0?,A0?,R1?,S1?,A1?,R2?,S2?,...
  • 有限馬爾科夫決策過程的建模

    • p(s′,r∣s,a)=P(St=s′,Rt=r∣st?1=s,At?1=a),[0,1]p(s',r|s,a)=P(St=s',Rt=r|s_{t-1}=s,A_{t-1}=a),[0,1]p(s,rs,a)=P(St=s,Rt=rst?1?=s,At?1?=a),[0,1]
      • 和為1
      • 枚舉很大,(能枚舉出來的話,A*就可以了
    • 狀態轉移概率:
      • p(s′∣s,a)=Σrp(s′,r∣s,a)p(s'|s,a)=\Sigma_r p(s',r|s,a)p(ss,a)=Σr?p(s,rs,a)
    • 狀態-行為對的期望獎勵
      • r(s,a)=E(Rt∣st?1=s,At?1=a)=ΣrrΣs′p(s′,r∣s,a)r(s,a)=E(Rt|s_{t-1}=s,A_{t-1}=a)=\Sigma_r r\Sigma_s' p(s',r|s,a)r(s,a)=E(Rtst?1?=s,At?1?=a)=Σr?rΣs?p(s,rs,a)
    • 狀態-行為-下一個狀態,的獎勵
      • r(s,a,s′)=E(Rt∣St?1=s,At?1=a,St=s′)=Σrrp(s′,r∣s,a)p(s′∣s,a)r(s,a,s')=E(Rt|S_{t-1}=s,A_{t-1}=a,S_t=s')=\Sigma_r r \frac{p(s',r|s,a)}{p(s'|s,a)}r(s,as)=E(RtSt?1?=s,At?1?=a,St?=s)=Σr?rp(ss,a)p(s,rs,a)?
  • 獎勵假設

    • 目標:長期或最終的
    • 獎勵:即時的
    • 假設(強化學習的基礎)
      • 目標和目的是獎勵累積的期望值的最大化
  • 累積獎勵

    • 多幕式任務:Gt=Rt+1+Rt+2+Rt+3+...+RT(t<T,T?最終步,終止態)G_t=R_{t+1}+R_{t+2}+R_{t+3}+...+R_{T}(t<T,T-最終步,終止態)Gt?=Rt+1?+Rt+2?+Rt+3?+...+RT?(t<TT?)
      • 具有終止態的馬爾科夫決策過程——多幕式任務
    • 連續式任務Gt=Rt+1+γRt+2+γ2Rt+3+...=Σk=0∞γkRt+k+1,0≤γ≤1(折扣率G_t=R_{t+1}+\gamma R_{t+2}+\gamma^2 R_{t+3}+...=\Sigma_{k=0}^{\infty}\gamma^kR_{t+k+1},0 \leq \gamma \leq 1(折扣率Gt?=Rt+1?+γRt+2?+γ2Rt+3?+...=Σk=0?γkRt+k+1?,0γ1(
      • 無終止
      • 遞推:Gt=Σk=0∞γkRt+k+1=Rt+1+γGt+1G_t=\Sigma_{k=0}^{\infty}\gamma^kR_{t+k+1}=R_{t+1}+\gamma G_{t+1}Gt?=Σk=0?γkRt+k+1?=Rt+1?+γGt+1?
      • 求和公式Gt=Σk=t+1Tγk?t?1Rk,T=∞和γ=1不能同時出現(不收斂)G_t=\Sigma_{k=t+1}^{T}\gamma^{k-t-1}R_{k},T=\infty和\gamma=1不能同時出現(不收斂)Gt?=Σk=t+1T?γk?t?1Rk?,T=γ=1
  • 策略

    • 狀態到行為的映射
    • 隨機式策略π(a∣s)\pi(a|s)π(as)概率
    • 確定式策略a=π(s)a=\pi(s)a=π(s)
    • 狀態估值函數
      • vπ(s)=Eπ(Gt∣St=s)=Eπ(Σk=0∞γkRt+k+1∣St=s),foralls∈Sv_{\pi}(s)=E_\pi(G_t|S_t=s)=E_\pi(\Sigma_{k=0}^{\infty}\gamma^kR_{t+k+1}|S_t=s),for all s \in Svπ?(s)=Eπ?(Gt?St?=s)=Eπ?(Σk=0?γkRt+k+1?St?=s),forallsS
    • 行為估值函數
      • q(s,a)=Eπ(Gt∣St=s,At=a)=Eπ(Σk=0∞γkRt+k+1∣St=s,At=a)q(s,a)=E_\pi(G_t|S_t=s,A_t=a)=E_\pi(\Sigma_{k=0}^{\infty}\gamma^kR_{t+k+1}|S_t=s,A_t=a)q(s,a)=Eπ?(Gt?St?=s,At?=a)=Eπ?(Σk=0?γkRt+k+1?St?=s,At?=a)
  • 貝爾曼方程(方程,可以聯立)

    • n個狀態–>n個方程n個變量的線性方程組
  • 最優策略

    • 策略π和π′兩個策略,對于所有s,vπ(s)≥vπ′(s)===π≥π′策略\pi和\pi'兩個策略,對于所有s,v_{\pi}(s)\geq v_{\pi'}(s)===\pi \geq \pi'ππsvπ?(s)vπ?(s)===ππ
    • v?(s)=maxπvπ(s),對應的最優策略可以有多個,但v一樣v*(s)=max_{\pi}v_{\pi}(s),對應的最優策略可以有多個,但v一樣v?(s)=maxπ?vπ?(s),v
    • 行為估值函數:q?(s,a)=maxπqπ(s,a)q*(s,a)=max_\pi q_\pi(s,a)q?(s,a)=maxπ?qπ?(s,a)
  • 貝爾曼最優方程(這是個賦值)

  • 基于狀態估值函數的貝爾曼最優性方程

    • ? 第一步:求解狀態估值函數的貝爾曼最優性方程得到最優策略對應的狀態估值函數
    • ? 第二步:根據狀態估值函數的貝爾曼最優性方程,進行一步搜索找到每個狀態下的最優行為
    • ? 注意:最優策略可以存在多個
    • ? 貝爾曼最優性方程的優勢,可以采用貪心局部搜索即可得到全局最優解
  • 基于行為估值函數的貝爾曼最優性方程

    • 直接得到最優策略
  • 局限性

  • 需要知道環境模型
  • 需要高昂的計算代價和內存(存放估值函數)
  • 依賴于馬爾科夫性
  • 實際應用

  • 動態規劃(考)
  • 蒙特卡羅方法
  • 時序查分(用的多
  • 參數化方法(用的多

1.動態規劃

  • 策略估值
  • 列方程(計算量大
  • 迭代策略估值——尋找不動點
    • 更新規則(期望更新)vk+1(s)=Σaπ(a∣s)Σs′r′p(s′,r∣s,a)(r+γvk(s′))v_{k+1}(s)=\Sigma_a\pi(a|s)\Sigma_{s'r'}p(s',r|s,a)(r+\gamma v_k(s'))vk+1?(s)=Σa?π(as)Σsr?p(s,rs,a)(r+γvk?(s))
    • 得到穩定點時,得到方程的解
    • 兩種實現方式
    • 同步更新:兩個數組存放,一個新數組,一個舊數組
    • 異步更新:一個數組,同時放新的和舊的。(收斂快,收斂性有保證)
    • 目標:尋找最優策略(策略提升)
import numpy as np v=np.zeros((5,5)) print(v) [[0. 0. 0. 0. 0.][0. 0. 0. 0. 0.][0. 0. 0. 0. 0.][0. 0. 0. 0. 0.][0. 0. 0. 0. 0.]] action=np.array([[-1,0],[1,0],[0,-1],[0,1]]) for k in range(100): for i in range(0,5):for j in range(5):s=np.array([i,j])v_a=0.0for a in action:s_1=s+a if(s_1[0]<0 or s_1[0]>4 or s_1[1]<0 or s_1[1]>4):#超出范圍s_1=sv_a+=1/4*(-1.+0.9*v[s_1[0],s_1[1]])elif(np.equal([0,1],s).all()):#As_1=np.array([4,1])v_a+=1/4*(10.+0.9*v[s_1[0],s_1[1]])elif(np.equal([0,3],s).all()):#As_1=np.array([2,3])v_a+=1/4*(5.+0.9*v[s_1[0],s_1[1]])else:v_a+=1/4*0.9*v[s_1[0],s_1[1]]v[i,j]=v_a # print(v[i,j]) # s_.append(s_1)print(v) [[-0.5 7.25 1.38125 3.5 0.2875 ][-0.3625 1.5496875 0.65946094 0.93587871 0.02526021][-0.3315625 0.27407813 0.21004629 0.25783312 -0.186304 ][-0.32460156 -0.01136777 0.04470267 0.06807055 -0.27660253][-0.57303535 -0.3814907 -0.32577731 -0.30798402 -0.63153197]] [[ 0.8246875 8.62374378 2.93700231 4.46153736 0.63890445][ 0.12807031 2.17920446 1.40897965 1.38456233 0.16904517][-0.30715352 0.46591413 0.48992165 0.39515637 -0.22720862][-0.5236356 -0.08876464 0.03227631 -0.03535962 -0.51340812][-0.96151933 -0.64544919 -0.5305602 -0.58872306 -1.0321689 ]] [[ 1.84026754 8.75466414 3.70149128 4.77057646 0.89892187][ 0.61408748 2.52982021 1.82380398 1.61068095 0.30157386][-0.19392719 0.61583626 0.64509141 0.44847092 -0.24787869][-0.64776552 -0.14514798 -0.01484469 -0.15041361 -0.6873706 ][-1.22365701 -0.82258324 -0.69026002 -0.80385226 -1.30000115]] [[ 2.43608951 8.66455575 4.01609618 4.87609758 1.06949091][ 0.96186575 2.71486389 2.0220148 1.72083536 0.38990482][-0.08439791 0.70434212 0.71099621 0.45754634 -0.26975458][-0.72271789 -0.19255583 -0.07250248 -0.24889027 -0.81385373][-1.39833841 -0.94834094 -0.81586503 -0.96293696 -1.48477842]] [[ 2.76218512e+00 8.55939491e+00 4.13156078e+00 4.90596573e+001.17284179e+00][ 1.17976629e+00 2.80474158e+00 2.10783013e+00 1.76878058e+004.38898838e-01][-7.67666250e-03 7.45988690e-01 7.28744102e-01 4.45247962e-01-2.94878841e-01][-7.72289979e-01 -2.35607559e-01 -1.28614221e-01 -3.28535315e-01-9.07460420e-01][-1.51639424e+00 -1.04114675e+00 -9.13426666e-01 -1.08017741e+00-1.61536880e+00]] [[ 2.93429457 8.4730898 4.16415045 4.90438466 1.23001759][ 1.3050033 2.84218018 2.13836745 1.78355225 0.46045771][ 0.0359807 0.75854192 0.7230472 0.42371669 -0.32158709][-0.80986999 -0.27474503 -0.17857346 -0.39206128 -0.97820746][-1.59885617 -1.11133929 -0.98879128 -1.16718972 -1.70963033]] [[ 3.02050351 8.40629118 4.16296859 4.88949532 1.25724735][ 1.37082523 2.8516558 2.14227537 1.78108765 0.46487126][ 0.05498252 0.75486161 0.70701305 0.39925177 -0.34802609][-0.84140995 -0.30970375 -0.22129723 -0.4426746 -1.03267116][-1.65885386 -1.16545484 -1.04711494 -1.23248716 -1.77899427]] [[ 3.05907777 8.3547335 4.14863137 4.86963139 1.2660244 ][ 1.4007218 2.84683326 2.1338022 1.77020024 0.45944071][ 0.05806009 0.7429956 0.68731927 0.37503424 -0.37290002][-0.86917918 -0.34041052 -0.25714818 -0.48313626 -1.07523288][-1.70427689 -1.20788287 -1.09254246 -1.28211103 -1.8309498 ]] [[ 3.07156244 8.3144941 4.12997579 4.84881517 1.26406856][ 1.40986496 2.83526029 2.12012001 1.75576728 0.44893472][ 0.05189183 0.7276637 0.6672757 0.35257651 -0.39548988][-0.89394432 -0.36704513 -0.28697583 -0.51564223 -1.10889583][-1.73983572 -1.24164389 -1.12823647 -1.3203114 -1.87049904]] [[ 3.07018389 8.28265155 4.11085157 4.82897256 1.25635999][ 1.40762022 2.82106248 2.10486533 1.74045355 0.43630814][ 0.04097707 0.71151078 0.64844478 0.3324974 -0.41550554][-0.91596582 -0.38994182 -0.31170954 -0.54189436 -1.13577882][-1.76838826 -1.26884735 -1.15654857 -1.35008201 -1.90104325]] [[ 3.0618939 8.25712464 4.09290817 4.81095457 1.2459961 ][ 1.39959957 2.80644757 2.08985717 1.72566389 0.42330408][ 0.02812736 0.69594252 0.6314822 0.31494289 -0.43293341][-0.93538792 -0.40950052 -0.33220378 -0.56320239 -1.15741552][-1.79172765 -1.29099042 -1.17921057 -1.3735461 -1.92493583]] [[ 3.0506152 8.23643451 4.07678474 4.79505123 1.23482819][ 1.38907768 2.79254517 2.0759571 1.71208244 0.41088829][ 0.01499592 0.68164262 0.61657624 0.29981765 -0.44791967][-0.95236454 -0.42613113 -0.34919277 -0.58057577 -1.17494053][-1.81103231 -1.309157 -1.19749895 -1.39222524 -1.94383342]] [[ 3.03851708e+00 8.21951679e+00 4.06264472e+00 4.78126344e+001.22390683e+00][ 1.37790557e+00 2.77987997e+00 2.06351626e+00 1.69998427e+003.99543436e-01][ 2.49040483e-03 6.68883485e-01 6.03680541e-01 2.86913109e-01-4.60690822e-01][-9.67083453e-01 -4.40223689e-01 -3.63289019e-01 -5.94796878e-01-1.18920887e+00][-1.82711864e+00 -1.32414961e+00 -1.21236164e+00 -1.40722387e+00-1.95892241e+00]] [[ 3.02675272 8.20559029 4.05042831 4.76945062 1.21378173][ 1.36708145 2.76864108 2.0526152 1.68941753 0.38946167][-0.00894133 0.65771024 0.59263864 0.27597791 -0.47150353][-0.9797576 -0.45213435 -0.3749972 -0.60647671 -1.20087509][-1.84058251 -1.33657632 -1.22451078 -1.4193551 -1.97106687]] [[ 3.01588987 8.1940688 4.03997666 4.75941148 1.20469824][ 1.35710099 2.75883642 2.04319558 1.68031049 0.38066755][-0.01912473 0.6480486 0.5832506 0.26675569 -0.48061496][-0.99060982 -0.46218032 -0.38473137 -0.61609632 -1.21044698][-1.85187901 -1.34690795 -1.23448867 -1.42922657 -1.98090664]] [[ 3.00616364 8.18450261 4.03109442 4.75092768 1.19672313][ 1.34816967 2.75038121 2.03513326 1.67253394 0.37309467][-0.02804116 0.63976732 0.5753081 0.25900442 -0.48826664][-0.99985982 -0.47063966 -0.39283122 -0.62403758 -1.21832301][-1.8613683 -1.35551603 -1.24271406 -1.43729909 -1.98892296]] [[ 2.9976249 8.17653977 4.0235814 4.74378671 1.18982372][ 1.34033029 2.74314839 2.02827866 1.66593701 0.36663247][-0.03575576 0.63271374 0.56861226 0.25250514 -0.49467671][-1.0077153 -0.47775348 -0.39957589 -0.63060589 -1.22481893][-1.86934278 -1.36269843 -1.24951468 -1.44392709 -1.99548319]] [[ 2.99022697 8.16990001 4.01724803 4.73779298 1.1839164 ][ 1.33353873 2.73699701 2.02247872 1.66036709 0.36115383][-0.04237418 0.62673336 0.5629818 0.24706491 -0.5000373 ][-1.01436679 -0.48372924 -0.4051953 -0.63604719 -1.23018699][-1.87604393 -1.36869691 -1.25515015 -1.44938671 -2.00087152]] [[ 2.98387585 8.16435708 4.01192228 4.73277224 1.17889575][ 1.32770842 2.73178746 2.01758819 1.65568031 0.35653083][-0.04801732 0.62168011 0.55825603 0.24251667 -0.50451478][-1.01998539 -0.48874444 -0.40987929 -0.64056067 -1.23463014][-1.88167329 -1.37370958 -1.25982829 -1.45389562 -2.00531048]] [[ 2.97845887 8.15972638 4.00745205 4.7285725 1.17465134][ 1.32273592 2.72738938 2.013475 1.65174638 0.3526431 ][-0.052807 0.61742114 0.55429504 0.23871734 -0.50825151][-1.02472228 -0.49295025 -0.41378494 -0.64430851 -1.23831264][-1.88640015 -1.37789986 -1.26371696 -1.4576271 -2.00897616]] [[ 2.97386051 8.15585603 4.00370501 4.72506302 1.17107698][ 1.31851523 2.72368517 2.01002211 1.64845025 0.34938173][-0.0568584 0.6138386 0.55097845 0.23554546 -0.51136832][-1.02870949 -0.49647503 -0.41704246 -0.64742327 -1.24136809][-1.89036717 -1.38140328 -1.2669527 -1.46072033 -2.01200916]] [[ 2.96997076 8.15262039 4.00056737 4.72213236 1.16807531][ 1.31494537 2.72057096 2.00712758 1.6456921 0.34665069][-0.06027638 0.61082955 0.54820353 0.23289841 -0.51396714][-1.03206132 -0.49942744 -0.41975997 -0.65001374 -1.24390558][-1.89369476 -1.38433259 -1.26964726 -1.46328786 -2.01452265]] [[ 2.96668914 8.14991509 3.99794204 4.71968621 1.16555969][ 1.31193404 2.71795641 2.00470367 1.64338627 0.34436664][-0.06315417 0.60830512 0.54588313 0.23068991 -0.51613364][-1.03487598 -0.50189927 -0.42202736 -0.65216945 -1.24601455][-1.89648457 -1.38678183 -1.27189247 -1.4654213 -2.01660826]] [[ 2.96392617 8.14765316 3.99574664 4.71764509 1.1634545 ][ 1.30939905 2.71576373 2.00267545 1.64145985 0.34245815][-0.06557335 0.6061892 0.54394362 0.22884759 -0.51793955][-1.03723746 -0.50396793 -0.4239194 -0.65396422 -1.24776848][-1.8988224 -1.38882954 -1.27376411 -1.46719552 -2.01834062]] [[ 2.96160352 8.14576202 3.99391157 4.71594227 1.16169462][ 1.30726841 2.71392639 2.00097932 1.63985115 0.34086448][-0.06760447 0.60441696 0.54232301 0.22731084 -0.51944486][-1.03921726 -0.50569858 -0.42549838 -0.6554591 -1.24922794][-1.90078061 -1.39054139 -1.27532486 -1.46867202 -2.01978077]] [[ 2.95965343 8.14418102 3.99237819 4.71452182 1.1602245 ][ 1.30547985 2.71238786 1.99956155 1.6385082 0.33953427][-0.06930811 0.60293344 0.54096917 0.22602902 -0.52069964][-1.04087602 -0.50714603 -0.42681618 -0.6567046 -1.25044291][-1.90242019 -1.39197231 -1.27662671 -1.46990142 -2.02097882]] [[ 2.95801774 8.14285942 3.99109722 4.713337 1.15899706][ 1.3039799 2.71110022 1.99837683 1.63738735 0.33842429][-0.07073593 0.60169217 0.53983842 0.22495984 -0.52174564][-1.04226509 -0.50835632 -0.42791607 -0.65774263 -1.25145475][-1.9037925 -1.39316826 -1.27771281 -1.47092553 -2.02197603]] [[ 2.95664683 8.14175479 3.99002732 4.71234872 1.1579726 ][ 1.30272298 2.71002303 1.99738712 1.63645199 0.33749823][-0.07193182 0.60065399 0.5388941 0.22406801 -0.52261769][-1.04342779 -0.50936808 -0.42883412 -0.65860794 -1.25229769][-1.90494074 -1.39416772 -1.27861904 -1.47177892 -2.02280645]] [[ 2.95549857 8.14083161 3.98913382 4.71152437 1.15711776][ 1.30167037 2.7091222 1.99656048 1.63567149 0.3367257 ][-0.07293293 0.59978594 0.53810557 0.22332407 -0.52334476][-1.04440064 -0.51021372 -0.4296004 -0.65932941 -1.25300012][-1.90590121 -1.39500288 -1.27937528 -1.47249026 -2.02349824]] [[ 2.9545373 8.14006017 3.98838774 4.71083673 1.15640454][ 1.30078931 2.70836908 1.99587012 1.63502024 0.33608129][-0.07377062 0.59906032 0.53744717 0.22270348 -0.52395103][-1.0452144 -0.51092041 -0.43024003 -0.65993106 -1.2535856 ][-1.90670443 -1.39570068 -1.28000641 -1.47308334 -2.02407472]] [[ 2.95373292 8.13941558 3.98776479 4.71026311 1.15580953][ 1.30005215 2.70773959 1.99529365 1.63447685 0.33554374][-0.07447132 0.59845388 0.53689747 0.22218575 -0.5244566 ][-1.04589488 -0.51151088 -0.43077395 -0.66043286 -1.2540737 ][-1.90737599 -1.39628364 -1.28053315 -1.47357792 -2.02455524]] [[ 2.95306005 8.13887705 3.98724469 4.70978458 1.15531316][ 1.29943561 2.70721354 1.99481232 1.63402344 0.33509534][-0.07505726 0.59794715 0.53643854 0.22175382 -0.52487826][-1.04646378 -0.5120042 -0.43121963 -0.66085142 -1.25448069][-1.90793737 -1.39677063 -1.2809728 -1.47399041 -2.02495585]] [[ 2.95249737 8.13842716 3.98681047 4.70938536 1.15489908][ 1.29892008 2.70677401 1.99441045 1.63364512 0.33472129][-0.07554711 0.59752378 0.53605539 0.22139344 -0.52522995][-1.0469393 -0.5124163 -0.43159165 -0.6612006 -1.25482009][-1.90840655 -1.39717741 -1.28133976 -1.47433449 -2.02528992]] [[ 2.95202695 8.13805136 3.98644797 4.70905228 1.15455364][ 1.29848914 2.70640681 1.99407494 1.63332944 0.33440924][-0.07595654 0.59717011 0.53573554 0.22109275 -0.52552331][-1.0473367 -0.51276052 -0.4319022 -0.66149191 -1.25510318][-1.90879862 -1.39751717 -1.28164607 -1.47462154 -2.02556852]] [[ 2.95163374 8.13773746 3.98614535 4.70877437 1.15426545][ 1.29812896 2.70610008 1.99379484 1.63306602 0.33414891][-0.07629869 0.59687469 0.53546852 0.22084185 -0.52576804][-1.04766877 -0.51304803 -0.43216143 -0.66173497 -1.25533931][-1.90912622 -1.39780093 -1.28190174 -1.47486102 -2.02580091]] [[ 2.95130513 8.1374753 3.98589272 4.70854248 1.15402502][ 1.29782798 2.70584388 1.99356101 1.63284621 0.33393172][-0.07658458 0.59662795 0.53524561 0.22063248 -0.52597221][-1.04794621 -0.51328814 -0.43237783 -0.66193778 -1.2555363 ][-1.90939991 -1.39803791 -1.28211516 -1.47506085 -2.02599477]] [[ 2.95103055 8.13725635 3.98568182 4.70834898 1.15382442][ 1.29757651 2.70562991 1.9933658 1.63266277 0.33375051][-0.07682342 0.59642189 0.53505953 0.22045777 -0.52614255][-1.04817798 -0.51348866 -0.43255847 -0.66210701 -1.25570064][-1.90962853 -1.39823581 -1.28229332 -1.47522759 -2.0261565 ]] [[ 2.95080114 8.13707351 3.98550578 4.70818752 1.15365704][ 1.29736643 2.70545122 1.99320284 1.63250969 0.33359931][-0.07702294 0.59624981 0.53490419 0.22031197 -0.52628468][-1.04837158 -0.51365611 -0.43270926 -0.66224824 -1.25583776][-1.9098195 -1.39840107 -1.28244203 -1.47536673 -2.02629143]] [[ 2.9506095 8.13692082 3.98535881 4.70805277 1.15351739][ 1.29719095 2.70530199 1.9930668 1.63238194 0.33347314][-0.0771896 0.59610611 0.53477452 0.2201903 -0.52640328][-1.04853328 -0.51379594 -0.43283513 -0.6623661 -1.25595218][-1.909979 -1.39853906 -1.28256616 -1.47548285 -2.02640403]] [[ 2.95044942 8.13679332 3.98523614 4.70794032 1.15340085][ 1.29704437 2.70517739 1.99295325 1.63227533 0.33336786][-0.07732879 0.59598611 0.53466627 0.22008875 -0.52650224][-1.04866833 -0.51391269 -0.4329402 -0.66246446 -1.25604765][-1.91011221 -1.39865428 -1.28266978 -1.47557975 -2.02649798]] [[ 2.95031572 8.13668686 3.98513373 4.70784648 1.15330361][ 1.29692196 2.70507334 1.99285845 1.63218635 0.33328 ][-0.07744504 0.59588592 0.53457591 0.220004 -0.52658483][-1.04878111 -0.51401017 -0.43302791 -0.66254655 -1.25612733][-1.91022346 -1.39875048 -1.28275628 -1.47566063 -2.02657638]] [[ 2.95020406 8.13659797 3.98504824 4.70776816 1.15322246][ 1.29681972 2.70498646 1.99277932 1.63211208 0.33320669][-0.07754211 0.59580227 0.53450048 0.21993327 -0.52665374][-1.04887529 -0.51409157 -0.43310113 -0.66261506 -1.25619381][-1.91031635 -1.39883081 -1.28282849 -1.47572812 -2.02664181]] [[ 2.95011081 8.13652375 3.98497688 4.70770279 1.15315474][ 1.29673435 2.70491393 1.99271326 1.6320501 0.3331455 ][-0.07762318 0.59573242 0.53443751 0.21987423 -0.52671126][-1.04895394 -0.51415953 -0.43316225 -0.66267224 -1.2562493 ][-1.91039393 -1.39889787 -1.28288877 -1.47578446 -2.02669641]] [[ 2.95003293 8.13646178 3.98491731 4.70764823 1.15309822][ 1.29666306 2.70485337 1.99265811 1.63199837 0.33309444][-0.07769087 0.59567411 0.53438495 0.21982496 -0.52675926][-1.04901961 -0.51421627 -0.43321327 -0.66271997 -1.25629561][-1.9104587 -1.39895386 -1.28293908 -1.47583148 -2.02674198]] [[ 2.94996791 8.13641005 3.98486758 4.7076027 1.15305106][ 1.29660353 2.7048028 1.99261208 1.63195519 0.33305182][-0.07774739 0.59562542 0.53434107 0.21978383 -0.52679933][-1.04907444 -0.51426363 -0.43325586 -0.6627598 -1.25633426][-1.91051278 -1.39900061 -1.28298108 -1.47587073 -2.02678001]] [[ 2.94991361 8.13636685 3.98482607 4.70756469 1.15301169][ 1.29655383 2.70476059 1.99257366 1.63191915 0.33301625][-0.07779458 0.59558477 0.53430444 0.2197495 -0.52683276][-1.04912022 -0.51430318 -0.43329142 -0.66279305 -1.25636652][-1.91055794 -1.39903963 -1.28301614 -1.47590349 -2.02681176]] [[ 2.94986828 8.13633079 3.98479142 4.70753297 1.15297884][ 1.29651233 2.70472535 1.99254158 1.63188907 0.33298656][-0.07783398 0.59555084 0.53427386 0.21972085 -0.52686067][-1.04915845 -0.5143362 -0.43332109 -0.66282081 -1.25639345][-1.91059564 -1.39907221 -1.28304541 -1.47593083 -2.02683825]] [[ 2.94983043 8.13630068 3.9847625 4.70750649 1.15295141][ 1.29647768 2.70469593 1.99251481 1.63186396 0.33296179][-0.07786688 0.59552251 0.53424834 0.21969694 -0.52688396][-1.04919036 -0.51436376 -0.43334587 -0.66284397 -1.25641592][-1.91062712 -1.39909941 -1.28306984 -1.47595365 -2.02686037]] [[ 2.94979882 8.13627555 3.98473835 4.70748439 1.15292853][ 1.29644875 2.70467136 1.99249245 1.631843 0.3329411 ][-0.07789435 0.59549886 0.53422704 0.21967697 -0.52690341][-1.04921701 -0.51438677 -0.43336655 -0.66286331 -1.25643467][-1.9106534 -1.39912212 -1.28309024 -1.4759727 -2.02687882]] [[ 2.94977244 8.13625457 3.9847182 4.70746595 1.15290942][ 1.2964246 2.70465086 1.9924738 1.63182551 0.33292384][-0.07791728 0.59547911 0.53420925 0.21966031 -0.52691963][-1.04923925 -0.51440598 -0.43338381 -0.66287945 -1.25645033][-1.91067534 -1.39914108 -1.28310726 -1.4759886 -2.02689423]] [[ 2.94975041 8.13623705 3.98470137 4.70745055 1.15289348][ 1.29640443 2.70463374 1.99245822 1.63181091 0.33290943][-0.07793642 0.59546263 0.5341944 0.2196464 -0.52693318][-1.04925782 -0.51442202 -0.43339822 -0.66289292 -1.2564634 ][-1.91069365 -1.3991569 -1.28312147 -1.47600188 -2.02690709]] [[ 2.94973202 8.13622243 3.98468733 4.70743769 1.15288017][ 1.2963876 2.70461945 1.99244522 1.63179872 0.33289741][-0.0779524 0.59544887 0.53418201 0.21963479 -0.52694449][-1.04927333 -0.51443541 -0.43341025 -0.66290417 -1.25647431][-1.91070895 -1.39917011 -1.28313334 -1.47601296 -2.02691782]] [[ 2.94971666 8.13621022 3.9846756 4.70742697 1.15286906][ 1.29637354 2.70460752 1.99243437 1.63178854 0.33288737][-0.07796575 0.59543738 0.53417167 0.2196251 -0.52695392][-1.04928627 -0.51444658 -0.43342029 -0.66291355 -1.25648341][-1.91072171 -1.39918114 -1.28314324 -1.4760222 -2.02692678]] [[ 2.94970385 8.13620003 3.98466582 4.70741801 1.15285979][ 1.29636181 2.70459756 1.99242531 1.63178005 0.33287899][-0.07797688 0.5954278 0.53416303 0.21961701 -0.5269618 ][-1.04929708 -0.51445591 -0.43342868 -0.66292139 -1.25649101][-1.91073237 -1.39919035 -1.28315151 -1.47602992 -2.02693426]] [[ 2.94969314 8.13619152 3.98465765 4.70741054 1.15285205][ 1.29635202 2.70458924 1.99241774 1.63177296 0.33287199][-0.07798618 0.59541979 0.53415582 0.21961026 -0.52696838][-1.0493061 -0.5144637 -0.43343567 -0.66292793 -1.25649736][-1.91074127 -1.39919803 -1.28315841 -1.47603637 -2.02694051]] [[ 2.94968421 8.13618442 3.98465083 4.7074043 1.15284559][ 1.29634384 2.7045823 1.99241143 1.63176705 0.33286616][-0.07799395 0.59541311 0.5341498 0.21960462 -0.52697386][-1.04931363 -0.5144702 -0.43344151 -0.66293339 -1.25650265][-1.91074869 -1.39920445 -1.28316417 -1.47604175 -2.02694572]] [[ 2.94967675 8.13617849 3.98464513 4.70739909 1.15284019][ 1.29633701 2.70457651 1.99240616 1.63176211 0.33286128][-0.07800043 0.59540753 0.53414478 0.21959992 -0.52697845][-1.04931991 -0.51447563 -0.43344639 -0.66293795 -1.25650707][-1.91075489 -1.39920981 -1.28316897 -1.47604624 -2.02695007]] [[ 2.94967053 8.13617354 3.98464038 4.70739474 1.15283569][ 1.29633132 2.70457167 1.99240176 1.63175798 0.33285722][-0.07800584 0.59540287 0.53414059 0.21959599 -0.52698227][-1.04932516 -0.51448016 -0.43345046 -0.66294175 -1.25651076][-1.91076007 -1.39921428 -1.28317299 -1.47604998 -2.0269537 ]] [[ 2.94966533 8.13616941 3.98463642 4.70739111 1.15283193][ 1.29632656 2.70456764 1.99239809 1.63175454 0.33285382][-0.07801035 0.59539898 0.53413709 0.21959271 -0.52698546][-1.04932954 -0.51448394 -0.43345386 -0.66294493 -1.25651384][-1.91076439 -1.39921801 -1.28317634 -1.47605311 -2.02695673]] [[ 2.94966099 8.13616596 3.9846331 4.70738808 1.1528288 ][ 1.29632259 2.70456426 1.99239502 1.63175167 0.33285099][-0.07801412 0.59539574 0.53413416 0.21958997 -0.52698813][-1.0493332 -0.5144871 -0.4334567 -0.66294758 -1.25651641][-1.910768 -1.39922113 -1.28317914 -1.47605572 -2.02695926]] [[ 2.94965737 8.13616308 3.98463034 4.70738555 1.15282618][ 1.29631927 2.70456145 1.99239247 1.63174927 0.33284862][-0.07801727 0.59539303 0.53413172 0.21958769 -0.52699035][-1.04933625 -0.51448973 -0.43345906 -0.66294979 -1.25651856][-1.91077101 -1.39922373 -1.28318147 -1.4760579 -2.02696137]] [[ 2.94965435 8.13616068 3.98462803 4.70738344 1.15282399][ 1.2963165 2.7045591 1.99239033 1.63174727 0.33284664][-0.0780199 0.59539077 0.53412969 0.21958578 -0.52699221][-1.0493388 -0.51449193 -0.43346104 -0.66295164 -1.25652035][-1.91077352 -1.3992259 -1.28318342 -1.47605972 -2.02696313]] [[ 2.94965182 8.13615867 3.98462611 4.70738168 1.15282217][ 1.29631419 2.70455714 1.99238855 1.6317456 0.332845 ][-0.07802209 0.59538888 0.53412799 0.21958419 -0.52699376][-1.04934093 -0.51449377 -0.43346269 -0.66295318 -1.25652184][-1.91077562 -1.39922771 -1.28318505 -1.47606124 -2.0269646 ]] [[ 2.94964971 8.136157 3.9846245 4.70738021 1.15282065][ 1.29631227 2.70455551 1.99238706 1.6317442 0.33284362][-0.07802392 0.59538731 0.53412657 0.21958286 -0.52699505][-1.0493427 -0.5144953 -0.43346407 -0.66295447 -1.25652309][-1.91077737 -1.39922922 -1.28318641 -1.47606251 -2.02696583]] [[ 2.94964796 8.1361556 3.98462316 4.70737898 1.15281938][ 1.29631066 2.70455414 1.99238582 1.63174304 0.33284247][-0.07802545 0.59538599 0.53412539 0.21958175 -0.52699613][-1.04934419 -0.51449658 -0.43346522 -0.66295554 -1.25652413][-1.91077883 -1.39923049 -1.28318754 -1.47606357 -2.02696686]] [[ 2.94964649 8.13615443 3.98462204 4.70737795 1.15281831][ 1.29630931 2.704553 1.99238478 1.63174207 0.33284151][-0.07802673 0.59538489 0.5341244 0.21958083 -0.52699703][-1.04934542 -0.51449765 -0.43346618 -0.66295644 -1.256525 ][-1.91078006 -1.39923154 -1.28318849 -1.47606445 -2.02696771]] [[ 2.94964526 8.13615346 3.9846211 4.7073771 1.15281743][ 1.29630819 2.70455205 1.99238391 1.63174125 0.33284071][-0.07802779 0.59538398 0.53412357 0.21958005 -0.52699779][-1.04934646 -0.51449854 -0.43346698 -0.66295719 -1.25652573][-1.91078107 -1.39923242 -1.28318928 -1.47606519 -2.02696843]] [[ 2.94964424 8.13615264 3.98462032 4.70737638 1.15281669][ 1.29630725 2.70455125 1.99238319 1.63174058 0.33284004][-0.07802868 0.59538321 0.53412288 0.21957941 -0.52699841][-1.04934732 -0.51449929 -0.43346765 -0.66295781 -1.25652634][-1.91078192 -1.39923315 -1.28318994 -1.47606581 -2.02696903]] [[ 2.94964338 8.13615197 3.98461967 4.70737579 1.15281607][ 1.29630647 2.70455059 1.99238259 1.63174001 0.33283949][-0.07802942 0.59538257 0.53412231 0.21957887 -0.52699894][-1.04934804 -0.51449991 -0.4334682 -0.66295833 -1.25652684][-1.91078264 -1.39923377 -1.28319049 -1.47606632 -2.02696952]] [[ 2.94964267 8.1361514 3.98461912 4.70737529 1.15281556][ 1.29630582 2.70455003 1.99238208 1.63173954 0.33283902][-0.07803004 0.59538204 0.53412183 0.21957842 -0.52699938][-1.04934864 -0.51450043 -0.43346867 -0.66295877 -1.25652726][-1.91078323 -1.39923428 -1.28319095 -1.47606675 -2.02696994]] [[ 2.94964208 8.13615093 3.98461867 4.70737487 1.15281513][ 1.29630528 2.70454957 1.99238166 1.63173914 0.33283863][-0.07803056 0.59538159 0.53412143 0.21957805 -0.52699974][-1.04934914 -0.51450086 -0.43346906 -0.66295913 -1.25652762][-1.91078372 -1.39923471 -1.28319133 -1.47606711 -2.02697029]] [[ 2.94964158 8.13615053 3.98461829 4.70737453 1.15281477][ 1.29630482 2.70454919 1.99238131 1.63173882 0.33283831][-0.07803099 0.59538122 0.53412109 0.21957773 -0.52700005][-1.04934956 -0.51450122 -0.43346938 -0.66295943 -1.25652791][-1.91078414 -1.39923506 -1.28319165 -1.47606741 -2.02697058]] [[ 2.94964116 8.1361502 3.98461797 4.70737424 1.15281447][ 1.29630444 2.70454886 1.99238102 1.63173854 0.33283804][-0.07803135 0.59538091 0.53412081 0.21957747 -0.5270003 ][-1.04934991 -0.51450153 -0.43346965 -0.66295969 -1.25652816][-1.91078448 -1.39923536 -1.28319192 -1.47606766 -2.02697082]] [[ 2.94964082 8.13614993 3.98461771 4.707374 1.15281422][ 1.29630412 2.7045486 1.99238077 1.63173831 0.33283781][-0.07803165 0.59538065 0.53412058 0.21957725 -0.52700051][-1.0493502 -0.51450178 -0.43346988 -0.6629599 -1.25652836][-1.91078477 -1.39923561 -1.28319214 -1.47606787 -2.02697102]] [[ 2.94964053 8.1361497 3.98461749 4.7073738 1.15281401][ 1.29630386 2.70454837 1.99238057 1.63173812 0.33283762][-0.0780319 0.59538044 0.53412039 0.21957707 -0.52700069][-1.04935045 -0.51450199 -0.43347007 -0.66296008 -1.25652853][-1.91078501 -1.39923582 -1.28319233 -1.47606804 -2.02697119]] [[ 2.94964029 8.1361495 3.98461731 4.70737363 1.15281383][ 1.29630364 2.70454818 1.9923804 1.63173796 0.33283746][-0.07803211 0.59538026 0.53412022 0.21957692 -0.52700084][-1.04935065 -0.51450216 -0.43347023 -0.66296022 -1.25652868][-1.91078521 -1.39923599 -1.28319248 -1.47606818 -2.02697133]] [[ 2.94964009 8.13614934 3.98461715 4.70737349 1.15281369][ 1.29630345 2.70454803 1.99238026 1.63173783 0.33283733][-0.07803229 0.5953801 0.53412009 0.21957679 -0.52700096][-1.04935082 -0.51450231 -0.43347036 -0.66296035 -1.2565288 ][-1.91078538 -1.39923614 -1.28319261 -1.47606831 -2.02697145]] [[ 2.94963992 8.13614921 3.98461702 4.70737337 1.15281357][ 1.2963033 2.7045479 1.99238014 1.63173772 0.33283722][-0.07803243 0.59537998 0.53411997 0.21957669 -0.52700107][-1.04935096 -0.51450243 -0.43347047 -0.66296045 -1.2565289 ][-1.91078552 -1.39923626 -1.28319272 -1.47606841 -2.02697154]] [[ 2.94963978 8.1361491 3.98461692 4.70737327 1.15281347][ 1.29630317 2.70454779 1.99238004 1.63173762 0.33283713][-0.07803255 0.59537987 0.53411988 0.2195766 -0.52700115][-1.04935108 -0.51450253 -0.43347056 -0.66296053 -1.25652898][-1.91078563 -1.39923636 -1.28319281 -1.47606849 -2.02697163]] [[ 2.94963966 8.13614901 3.98461683 4.70737319 1.15281338][ 1.29630307 2.7045477 1.99237996 1.63173755 0.33283705][-0.07803266 0.59537979 0.5341198 0.21957652 -0.52700122][-1.04935118 -0.51450262 -0.43347064 -0.66296061 -1.25652905][-1.91078573 -1.39923644 -1.28319289 -1.47606856 -2.02697169]] [[ 2.94963956 8.13614893 3.98461675 4.70737312 1.15281331][ 1.29630298 2.70454762 1.99237989 1.63173748 0.33283699][-0.07803274 0.59537971 0.53411973 0.21957646 -0.52700128][-1.04935126 -0.51450269 -0.4334707 -0.66296067 -1.25652911][-1.91078581 -1.39923651 -1.28319295 -1.47606862 -2.02697175]] [[ 2.94963948 8.13614886 3.98461669 4.70737306 1.15281325][ 1.2963029 2.70454756 1.99237983 1.63173743 0.33283694][-0.07803281 0.59537965 0.53411968 0.21957641 -0.52700133][-1.04935133 -0.51450275 -0.43347075 -0.66296072 -1.25652915][-1.91078588 -1.39923657 -1.283193 -1.47606867 -2.0269718 ]] [[ 2.94963941 8.13614881 3.98461664 4.70737302 1.1528132 ][ 1.29630284 2.7045475 1.99237978 1.63173738 0.33283689][-0.07803287 0.5953796 0.53411963 0.21957637 -0.52700138][-1.04935139 -0.5145028 -0.4334708 -0.66296076 -1.25652919][-1.91078594 -1.39923662 -1.28319304 -1.47606871 -2.02697184]] [[ 2.94963936 8.13614876 3.9846166 4.70737298 1.15281316][ 1.29630279 2.70454746 1.99237974 1.63173734 0.33283685][-0.07803292 0.59537956 0.5341196 0.21957633 -0.52700141][-1.04935143 -0.51450284 -0.43347084 -0.66296079 -1.25652923][-1.91078598 -1.39923666 -1.28319308 -1.47606874 -2.02697187]] [[ 2.94963931 8.13614873 3.98461656 4.70737294 1.15281313][ 1.29630274 2.70454742 1.99237971 1.63173731 0.33283682][-0.07803296 0.59537952 0.53411956 0.2195763 -0.52700144][-1.04935147 -0.51450288 -0.43347087 -0.66296082 -1.25652926][-1.91078602 -1.39923669 -1.28319311 -1.47606877 -2.0269719 ]] [[ 2.94963927 8.1361487 3.98461653 4.70737292 1.1528131 ][ 1.29630271 2.70454739 1.99237968 1.63173729 0.3328368 ][-0.078033 0.59537949 0.53411954 0.21957628 -0.52700147][-1.04935151 -0.5145029 -0.43347089 -0.66296085 -1.25652928][-1.91078606 -1.39923672 -1.28319314 -1.4760688 -2.02697192]] [[ 2.94963924 8.13614867 3.9846165 4.70737289 1.15281307][ 1.29630268 2.70454737 1.99237966 1.63173727 0.33283678][-0.07803303 0.59537947 0.53411951 0.21957626 -0.52700149][-1.04935154 -0.51450293 -0.43347091 -0.66296087 -1.2565293 ][-1.91078608 -1.39923675 -1.28319316 -1.47606882 -2.02697194]] [[ 2.94963921 8.13614865 3.98461648 4.70737287 1.15281305][ 1.29630265 2.70454735 1.99237964 1.63173725 0.33283676][-0.07803305 0.59537945 0.5341195 0.21957624 -0.5270015 ][-1.04935156 -0.51450295 -0.43347093 -0.66296088 -1.25652932][-1.91078611 -1.39923677 -1.28319318 -1.47606883 -2.02697196]] [[ 2.94963919 8.13614863 3.98461646 4.70737286 1.15281304][ 1.29630263 2.70454733 1.99237962 1.63173723 0.33283674][-0.07803307 0.59537943 0.53411948 0.21957622 -0.52700152][-1.04935158 -0.51450297 -0.43347095 -0.6629609 -1.25652933][-1.91078613 -1.39923678 -1.28319319 -1.47606885 -2.02697197]] [[ 2.94963917 8.13614861 3.98461645 4.70737284 1.15281302][ 1.29630261 2.70454731 1.99237961 1.63173722 0.33283673][-0.07803309 0.59537942 0.53411947 0.21957621 -0.52700153][-1.04935159 -0.51450298 -0.43347096 -0.66296091 -1.25652934][-1.91078614 -1.3992368 -1.2831932 -1.47606886 -2.02697198]] [[ 2.94963915 8.1361486 3.98461644 4.70737283 1.15281301][ 1.2963026 2.7045473 1.9923796 1.63173721 0.33283672][-0.0780331 0.5953794 0.53411946 0.2195762 -0.52700154][-1.04935161 -0.51450299 -0.43347097 -0.66296092 -1.25652935][-1.91078615 -1.39923681 -1.28319321 -1.47606887 -2.02697199]] [[ 2.94963914 8.13614859 3.98461643 4.70737282 1.152813 ][ 1.29630259 2.70454729 1.99237959 1.6317372 0.33283671][-0.07803311 0.59537939 0.53411945 0.21957619 -0.52700155][-1.04935162 -0.514503 -0.43347098 -0.66296093 -1.25652936][-1.91078617 -1.39923682 -1.28319322 -1.47606888 -2.026972 ]] [[ 2.94963913 8.13614858 3.98461642 4.70737282 1.15281299][ 1.29630258 2.70454728 1.99237958 1.63173719 0.3328367 ][-0.07803312 0.59537939 0.53411944 0.21957619 -0.52700155][-1.04935163 -0.51450301 -0.43347099 -0.66296093 -1.25652936][-1.91078618 -1.39923683 -1.28319323 -1.47606888 -2.02697201]] [[ 2.94963912 8.13614857 3.98461641 4.70737281 1.15281299][ 1.29630257 2.70454727 1.99237957 1.63173719 0.3328367 ][-0.07803313 0.59537938 0.53411943 0.21957618 -0.52700156][-1.04935164 -0.51450302 -0.43347099 -0.66296094 -1.25652937][-1.91078618 -1.39923683 -1.28319324 -1.47606889 -2.02697201]] [[ 2.94963911 8.13614857 3.98461641 4.7073728 1.15281298][ 1.29630256 2.70454727 1.99237957 1.63173718 0.33283669][-0.07803314 0.59537937 0.53411943 0.21957618 -0.52700156][-1.04935164 -0.51450302 -0.433471 -0.66296094 -1.25652937][-1.91078619 -1.39923684 -1.28319324 -1.47606889 -2.02697202]] [[ 2.9496391 8.13614856 3.9846164 4.7073728 1.15281298][ 1.29630255 2.70454726 1.99237956 1.63173718 0.33283669][-0.07803314 0.59537937 0.53411942 0.21957617 -0.52700157][-1.04935165 -0.51450303 -0.433471 -0.66296095 -1.25652938][-1.91078619 -1.39923684 -1.28319325 -1.4760689 -2.02697202]] [[ 2.9496391 8.13614856 3.9846164 4.7073728 1.15281297][ 1.29630255 2.70454726 1.99237956 1.63173717 0.33283669][-0.07803315 0.59537936 0.53411942 0.21957617 -0.52700157][-1.04935165 -0.51450303 -0.43347101 -0.66296095 -1.25652938][-1.9107862 -1.39923685 -1.28319325 -1.4760689 -2.02697202]] [[ 2.94963909 8.13614855 3.98461639 4.70737279 1.15281297][ 1.29630254 2.70454725 1.99237955 1.63173717 0.33283668][-0.07803315 0.59537936 0.53411942 0.21957616 -0.52700157][-1.04935166 -0.51450303 -0.43347101 -0.66296095 -1.25652938][-1.9107862 -1.39923685 -1.28319325 -1.4760689 -2.02697203]] [[ 2.94963909 8.13614855 3.98461639 4.70737279 1.15281297][ 1.29630254 2.70454725 1.99237955 1.63173717 0.33283668][-0.07803315 0.59537936 0.53411941 0.21957616 -0.52700158][-1.04935166 -0.51450304 -0.43347101 -0.66296096 -1.25652939][-1.91078621 -1.39923685 -1.28319325 -1.47606891 -2.02697203]] [[ 2.94963909 8.13614855 3.98461639 4.70737279 1.15281297][ 1.29630254 2.70454725 1.99237955 1.63173717 0.33283668][-0.07803316 0.59537936 0.53411941 0.21957616 -0.52700158][-1.04935166 -0.51450304 -0.43347101 -0.66296096 -1.25652939][-1.91078621 -1.39923685 -1.28319326 -1.47606891 -2.02697203]]

貪心策略–找Gt最大的下一步s’–v最大

  • 策略提升

    • 根據當前的估值函數,尋找更優的策略,珠寶找到最優策略
      • 依據π的估值函數vπ,得到最優策略π′依據\pi的估值函數v_\pi,得到最優策略\pi'πvπ?,π
    • 提升方法
      • 看qπ(s,a)是否大于vπ(s)(這是下面定理的特例看q_\pi(s,a)是否大于v_\pi(s)(這是下面定理的特例qπ?(s,a)vπ?(s)(`
    • 定理
      • 如果qπ(s,π′(s))≥vπ(s),則π′比π好,vπ′(s)≥vπ(s)q_\pi(s,\pi'(s))\geq v_\pi(s),則\pi'比\pi好,v_\pi'(s) \geq v\pi(s)qπ?(s,π(s))vπ?(s),ππvπ?(s)vπ(s)
  • 循環進行–》策略迭代

  • 策略估值

  • 策略迭代=策略估值+策略提升

    • 貝爾曼方程
  • 估值迭代=不精確估值(一輪估值后)+策略提升

    • 貝爾曼最優方程
  • 可否在不精確估值情況下,策略提升?——精確估值耗費很長時間

    • 可以——估值迭代
  • 策略迭代

  • 估值迭代

  • 比較

  • 動態規劃

    • 自舉的方法(無中生有
    • 把貝爾曼方程變成更新規則
    • 優點:計算效率高
    • 缺點: 要知道環境的完整模型

蒙特卡羅方法——不知道環境完整模型情況下

  • 從真實或模擬的經驗中計算狀態(行動)估值函數

  • 不需要知道完整的模型

  • 采樣

  • 回到原狀態的就不要了

  • 基于蒙特卡羅的方法的策略迭代

    • 僅有狀態估值無法得出策略
    • 蒙特卡羅得到qπ(s,a)蒙特卡羅得到q_\pi(s,a)qπ?(s,a),貪心得到策略
  • 優點:不同狀態的估值在計算時獨立(不依賴于自舉)

    • 適用于模型未知或環境模型復雜
    • 收斂性由大數經歷決定
  • 缺點:部分狀態行為再蒙特卡羅模擬中不出現

    • 解決方案:exploring start :每個“狀態-行為”對都以一定的概率作為模擬的起始點(殘局)

  • 不要exploring start了
  • 其他方法——平衡開采和探索
    • on-policy
      • 每個狀態都進行探索:eg:𝜺貪心
        • 1??+?A(s)貪心;以?A(s)選擇費貪心1-\epsilon+\frac{\epsilon}{A(s)}貪心;以\frac{\epsilon}{A(s)}選擇費貪心1??+A(s)??A(s)??
      • 缺點:最終得到的最優策略僅僅是?\epsilon?最優策略(與最優解還有個小誤差)
    • off-policy
      • 使用兩個策略:
        • 目標策略π\piπ,和
          • 待優化策略
          • 貪心
        • 行為策略b
          • 保證每個狀態對所有行為進行探索的可能

2.1 on-policy蒙特卡羅

2.2 off-policy蒙特卡羅

時序差分方法

  • 蒙特卡洛一定要模擬到最后嗎
  • 非平穩模擬

  • 時序差分方法是強化學習中最核心的策略學習方法
  • TD和蒙特卡洛方法的聯系和區別
    • 聯系:都是從經驗中學習
    • 非平穩情形下的蒙特卡洛方法是TD的特例
    • 區別:蒙特卡洛方法需要episode完整的信息,TD只需要episode的部分信息
    • TD比蒙特卡羅快吧
  • TD和動態規劃方法的聯系和區別
    • 聯系:TD和動態規劃方法都采用自舉的方法
    • 區別:動態規劃方法依賴于完整的環境模型進行估計,TD依賴于經驗進行估計
  • 從一個猜測學習一個猜測
    • 保證他學對了:多走了一步
  • 收斂
  • 在線的從經驗中進行策略學習
  • 直接學習行為估值函數完成策略學習
  • 適用于狀態和行為空間比較小的問題

總結

以上是生活随笔為你收集整理的国科大高级人工智能10-强化学习(多臂赌博机、贝尔曼)的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。