期权定价_强化学习的期权定价
期權(quán)定價(jià)
This post demonstrates how to use reinforcement learning to price an American Option. An option is a derivative contract that gives its owner the right but not the obligation to buy or sell an underlying asset. Unlike its European-style counterpart, American-style option may exercise at any time before expiry.
這篇文章演示了如何使用強(qiáng)化學(xué)習(xí)來(lái)定價(jià)美國(guó)期權(quán)。 期權(quán)是一種衍生合同,它賦予其所有者購(gòu)買或出售基礎(chǔ)資產(chǎn)的權(quán)利,但沒(méi)有義務(wù)。 與歐式期權(quán)不同,美式期權(quán)可以在到期前的任何時(shí)間行使。
American Option is known to be an optimal control MDP (Markov Decision Process) problem where the underlying process is Geometric Brownian motion ([1]). The Markovian state is a price-time tuple and the control is a binary action that decides on each day whether to exercise the option or not.
眾所周知,美式期權(quán)是一個(gè)最優(yōu)控制MDP(馬爾可夫決策過(guò)程)問(wèn)題,其中基礎(chǔ)過(guò)程是幾何布朗運(yùn)動(dòng)([1])。 馬爾可夫狀態(tài)是一個(gè)價(jià)格時(shí)間元組,而控件是一個(gè)二進(jìn)制操作,它每天決定是否執(zhí)行該期權(quán)。
The optimal stopping policy looks like the figure below, where the x-axis is time and the y-axis is the stock price. The curve in red is commonly called the optimal exercise boundary. On each day, if the stock price falls in the exercise region that is located above the boundary for a call or below the boundary for a put, it is optimal to exercise the option and get paid by the amount between stock price and strike price.
最佳停止策略如下圖所示,其中x軸是時(shí)間,y軸是股票價(jià)格。 紅色的曲線通常稱為最佳運(yùn)動(dòng)邊界。 每天,如果股票價(jià)格位于看漲期權(quán)的邊界上方或看跌期權(quán)的邊界下方的行使區(qū)域中,則最好行使期權(quán)并按股票價(jià)格與行使價(jià)之間的金額獲得報(bào)酬。
One can imagine it as a discretized Q-table as illustrated in dotted grids. Every day the agent or the trader looks up the table and take action according to today’s’ price. The Q-table is monotonous in that all the grids above the boundary yield a go-decision and all the grids below yield a no-go decision. Therefore Q-learning suits well to find the optimal strategy that is defined by this boundary.
可以想象它是離散的Q表,如虛線網(wǎng)格所示。 代理商或交易員每天都會(huì)查表,并根據(jù)今天的價(jià)格采取行動(dòng)。 Q表是單調(diào)的,因?yàn)檫吔缟戏降乃芯W(wǎng)格均產(chǎn)生決策,而邊界下方的所有網(wǎng)格均產(chǎn)生決策。 因此,Q學(xué)習(xí)非常適合找到由該邊界定義的最佳策略。
The remainder contains three sections. In the first section, a baseline price is computed using classical models. In the second section, an OpenAI gym environment is constructed, similar to building an Atari game. and then in the third section, an agent is trained with DQN (Deep Q-Network) to play American options, similar to training computers to play Atari games. The full Python notebook is located here on Github.
其余部分分為三個(gè)部分。 在第一部分中,使用經(jīng)典模型計(jì)算基準(zhǔn)價(jià)格。 在第二部分中,構(gòu)建了OpenAI體育館環(huán)境,類似于構(gòu)建Atari游戲 。 然后在第三部分中,對(duì)代理商進(jìn)行了DQN(深度Q網(wǎng)絡(luò))培訓(xùn),以玩美式期權(quán),類似于訓(xùn)練計(jì)算機(jī)玩Atari游戲。 完整的Python筆記本位于Github上 。
第一節(jié)-基準(zhǔn) (Section One — Baseline)
There are many ways to price an American option, from for example binomial tree to Longstaff-Schwartz Monte Carlo methods. Here I use QuantLib package to price a one-year American put option.
有多種定價(jià)美式期權(quán)的方法,例如從二叉樹(shù)到Longstaff-Schwartz蒙特卡洛方法 。 在這里,我使用QuantLib packag e為一年期美國(guó)看跌期權(quán)定價(jià)。
pricing_dict = {}bsm73 = ql.AnalyticEuropeanEngine(bsm_process) european_option.setPricingEngine(bsm73) pricing_dict['BlackScholesEuropean'] = european_option.NPV()analytical_engine = ql.BaroneAdesiWhaleyEngine(bsm_process) american_option.setPricingEngine(analytical_engine) pricing_dict['BawApproximation'] = american_option.NPV()binomial_engine = ql.BinomialVanillaEngine(bsm_process, "crr", 100) american_option.setPricingEngine(binomial_engine) pricing_dict['BinomialTree'] = american_option.NPV()print(pricing_dict){'BlackScholesEuropean': 6.92786901829998, 'BawApproximation': 7.091254636695334, 'BinomialTree': 7.090924645858217}The last line is the output, which says this American option is worth $7.091, while its European counterpart is worth $6.928. This implies an early exercise premium of $0.163.
最后一行是輸出,表示此美式期權(quán)價(jià)值$ 7.091,而其歐洲期權(quán)價(jià)值$ 6.928。 這意味著提前行使期權(quán)費(fèi)為0.163美元。
第二部分-OpenAI體育館環(huán)境 (Section Two — OpenAI Gym Environment)
It is standard to derive from OpenAI gym environment. This makes our work expandable to further studies such as exotic options and stochastic volatilities. The underlying theory is the famous Black-Sholes framework and the underlying asset follows Geometric Brownian motion in the risk-neutral world. This is realized in the step function below,
它是從OpenAI體育館環(huán)境衍生而來(lái)的標(biāo)準(zhǔn)。 這使我們的工作可擴(kuò)展到進(jìn)一步的研究,例如外來(lái)期權(quán)和隨機(jī)波動(dòng)率。 基本理論是著名的Black-Sholes框架 ,基本資產(chǎn)在風(fēng)險(xiǎn)中性世界中遵循幾何布朗運(yùn)動(dòng) 。 這在下面的步進(jìn)功能中實(shí)現(xiàn),
def step(self, action):if action == 1: # exercise reward = max(K-self.S1, 0.0) * np.exp(-self.r * self.T * (self.day_step/self.N))done = Trueelse: # holdif self.day_step == self.N: # at maturity
reward = max(self.K-self.S1, 0.0) * np.exp(-self.r * self.T)
done = Trueelse: # move to tomorrow
reward = 0# lnS1 - lnS0 = (r - 0.5*sigma^2)*t + sigma * Wt
self.S1 = self.S1 * np.exp((self.r - 0.5 * self.sigma**2) * (self.T/self.N) + self.sigma * np.sqrt(self.T/self.N) * np.random.normal())
self.day_step += 1
done = False tao = 1.0-self.day_step/self.N # time to maturity, in unit of yearsreturn np.array([self.S1, tao]), reward, done, {}
The AmeriOptionEnv takes action 0 as to hold or not exercise, and action 1 as to exercise. If we stick to the no-exercise policy until expiry, this essentially becomes stock price simulation and serves as input to price European option as control variate.
AmeriOptionEnv采取行動(dòng)0來(lái)決定是否進(jìn)行鍛煉,采取行動(dòng)1來(lái)進(jìn)行鍛煉。 如果我們?cè)诘狡谇皥?jiān)持不運(yùn)動(dòng)的政策,這實(shí)質(zhì)上就是股票價(jià)格模擬,并作為控制變量作為歐洲期權(quán)價(jià)格的輸入。
import matplotlib.pyplot as pltenv = AmeriOptionEnv()s = env.reset()sim_prices = []
sim_prices.append(s[0])for i in range(365):
action = 0 # hold until expiry
s_next, reward, done, info = env.step(action)
sim_prices.append(s_next[0])plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.plot(sim_prices)Stock Price Gym Simulation股票價(jià)格模擬
第三節(jié)-DQN定價(jià) (Section Three — Pricing with DQN)
Once the gym environment is constructed, we are ready to price the American option using reinforcement learning, specifically DQN (Deep Q-Network) in this post. Here I use the Tensorflow TF-Agents Library. Alternative choices are other OpenAI Gym compatible libraries such as Pytorch and OpenAI baseline.
構(gòu)建好體育館環(huán)境后,我們準(zhǔn)備使用強(qiáng)化學(xué)習(xí)(特別是本文中的DQN(深度Q網(wǎng)絡(luò)))為美國(guó)選項(xiàng)定價(jià)。 在這里我使用Tensorflow TF-Agents Library 。 其他的選擇是其他與OpenAI Gym兼容的庫(kù),例如Pytorch和OpenAI基線 。
The code follows precisely the TF-Agents API document. The only changes I made are importing customized AmeriOption environment and adjusting hyper-parameters that are more pertinent to the one-year option than Cartpole game.
該代碼恰好遵循TF-Agents API文檔。 我所做的唯一更改是導(dǎo)入自定義的AmeriOption環(huán)境并調(diào)整與一年選項(xiàng)相比比Cartpole游戲更相關(guān)的超參數(shù)。
As labelled in Jupyter notebook, the RL model is constructed in the following steps:
如Jupyter筆記本中標(biāo)記的那樣,RL模型是按照以下步驟構(gòu)建的:
According to API, a TF-agent is defined as
根據(jù)API,TF代理定義為
agent = dqn_agent.DqnAgent(train_env.time_step_spec(),
train_env.action_spec(),
q_network=q_net,
optimizer=optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)agent.initialize()
to be aware of the environment states, action space, the deep neural network for policy evaluation, and an optimizer on temporal-difference loss function to do TD-optimization.
要了解環(huán)境狀態(tài),行動(dòng)空間,用于策略評(píng)估的深度神經(jīng)網(wǎng)絡(luò),以及針對(duì)時(shí)差損失函數(shù)的優(yōu)化器以進(jìn)行TD優(yōu)化。
Policy Training Performance政策培訓(xùn)績(jī)效The training performance is shown as above. It is rather noisy because the evaluation step uses only 10 simulation paths and is subject to Monte Carlo randomness. For example, we know the option price is around $7 yet the average price can go as high as $12. Therefore, after learning the optimal stopping policy, it is essential to do a full-blown Monte Carlo to find the actual price as below.
訓(xùn)練效果如上所示。 這很嘈雜,因?yàn)樵u(píng)估步驟僅使用10條仿真路徑,并且受到蒙特卡洛隨機(jī)性的影響。 例如,我們知道期權(quán)價(jià)格約為7美元,但平ASP格可能高達(dá)12美元。 因此,在學(xué)習(xí)了最佳停止策略之后,有必要做一個(gè)成熟的蒙特卡洛法以找到如下的實(shí)際價(jià)格。
import pandas as pdnpv = compute_avg_return(eval_env, agent.policy, num_episodes=2_000)pricing_dict['ReinforcementAgent'] = npvpricing_df = pd.DataFrame.from_dict(pricing_dict, orient='index')
pricing_df.columns = ['Price']
print(pricing_df)
The Reinforcement learning agent values the price at $7.057, implying an early exercise premium $0.129, This result is in line with classical baseline models.
強(qiáng)化學(xué)習(xí)代理的價(jià)格為$ 7.057,這意味著早期行使價(jià)為$ 0.129,此結(jié)果與經(jīng)典基準(zhǔn)模型一致。
結(jié)論 (Conclusion)
In this post, we prepare a gym environment and then train a DQN TF-Agent to price an American option. The result is encouraging with a reasonably good price that is in line with classical baseline models. Some improvements include,
在這篇文章中,我們準(zhǔn)備了一個(gè)健身房環(huán)境,然后訓(xùn)練了DQN TF-Agent來(lái)定價(jià)美式期權(quán)。 其結(jié)果是令人鼓舞的,它的合理價(jià)格與經(jīng)典基準(zhǔn)模型相一致。 一些改進(jìn)包括
For practitioners,
對(duì)于從業(yè)者
Use a mirror AmeriOption gym environment to provide antithetic variates.
在AmeriOption健身房環(huán)境中使用鏡子以提供對(duì)立變量 。
In function compute_avg_return, continue the simulation path to price European option as control variates.
在函數(shù)compute_avg_return中,隨著控制變量的變化,繼續(xù)進(jìn)行模擬以對(duì)歐式期權(quán)定價(jià)。
For researchers,
對(duì)于研究人員
Add another MDP process to capture stochastic volatility.
添加另一個(gè)MDP流程以捕獲隨機(jī)波動(dòng)率。
Instead of using the default network structure, design a specialized multi-layer network to enable transfer learning into other maturities as well as options on rates, futures, FX, and exotic products.
與其使用默認(rèn)的網(wǎng)絡(luò)結(jié)構(gòu),不如設(shè)計(jì)一個(gè)專門的多層網(wǎng)絡(luò)以使學(xué)習(xí)轉(zhuǎn)移到其他到期日以及利率,期貨,外匯和奇異產(chǎn)品的選擇權(quán)。
翻譯自: https://medium.com/swlh/option-pricing-using-reinforcement-learning-ad2ddca7735b
期權(quán)定價(jià)
總結(jié)
以上是生活随笔為你收集整理的期权定价_强化学习的期权定价的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: DragonBones快速入门指南1
- 下一篇: ACM一年总结(写于2011年11月18