特斯拉自动驾驶使用的技术_使用自回归预测特斯拉股价
特斯拉自動(dòng)駕駛使用的技術(shù)
Tesla has been making waves in financial markets over the last few months. Previously named the most shorted stock in the US [1], Tesla’s stock price has since catapulted the electric carmaker to a market capitalization of $278 billion [2]. Its latest quarterly results suggest that it is now available to be added to the S&P 500, which it is currently not a member of, despite being the 12th largest company in the US [3].
在過(guò)去的幾個(gè)月中,特斯拉一直在金融市場(chǎng)掀起波瀾。 特斯拉以前曾被稱為美國(guó)最短缺的股票[1],此后,其股價(jià)便將這家電動(dòng)汽車制造商的市值推升至2780億美元[2]。 其最新的季度業(yè)績(jī)表明,盡管它是美國(guó)第12大公司,但現(xiàn)在仍可加入標(biāo)準(zhǔn)普爾500指數(shù)(S&P 500),該指數(shù)目前尚未加入。
Amid market volatility, various trading strategies and a sense of “FOMO” (fear of missing out), predicting the returns of Tesla’s stock is a difficult task. However, we are going to use Python to forecast Tesla’s stock price returns using autoregression.
在市場(chǎng)動(dòng)蕩,各種交易策略以及一種“ FOMO”(害怕錯(cuò)過(guò))的氛圍下,預(yù)測(cè)特斯拉股票的回報(bào)是一項(xiàng)艱巨的任務(wù)。 但是,我們將使用Python通過(guò)自回歸來(lái)預(yù)測(cè)特斯拉的股價(jià)回報(bào)。
Exploring the data
探索數(shù)據(jù)
First, we need to import the data. We may use historical stock price data downloaded from Yahoo Finance. We’re going to use the “Close” price for this analysis.
首先,我們需要導(dǎo)入數(shù)據(jù)。 我們可能會(huì)使用從Yahoo Finance下載的歷史股價(jià)數(shù)據(jù)。 我們將使用“平倉(cāng)”價(jià)格進(jìn)行此分析。
import pandas as pddf = pd.read_csv("TSLA.csv", index_col=0, parse_dates=[0])df.head()Source: Yahoo Finance.資料來(lái)源:雅虎財(cái)經(jīng)。 Source: Yahoo Finance.資料來(lái)源:雅虎財(cái)經(jīng)。
To determine the order for the ARMA model, we can firstly plot a partial autocorrelation function. This gives a graphical interpretation of the amount of correlation between the dependent variable and the lags of itself, which is not explained by correlations at all lower-order lags.
為了確定ARMA模型的順序,我們首先可以繪制部分自相關(guān)函數(shù)。 這給出了因變量和其自身滯后之間的相關(guān)量的圖形解釋, 但并未通過(guò)所有低階滯后的相關(guān)解釋 。
From the PACF below, we can see that the significance of the lags cuts off after lag 1, which suggests we should use an autoregressive (AR) model [4].
從下面的PACF中,我們可以看到滯后的重要性在滯后1之后就消失了,這表明我們應(yīng)該使用自回歸(AR)模型[4]。
# Plot PACFfrom statsmodels.tsa.stattools import acf, pacf
plt.bar(x=np.arange(0,41), height=pacf(df.Close))
plt.title("PACF")Finite & cuts off after lag 1, so AR.滯后1后有限&截止,因此AR。
When plotting the autocorrelation function, we get a slightly different result. The series is infinite and slowly damps out, which suggests an AR or ARMA model [4]. Taking both the PACF and the ACF into account, we are going to use an AR model.
在繪制自相關(guān)函數(shù)時(shí),我們得到的結(jié)果略有不同。 該序列是無(wú)限的,并逐漸衰減,這表明存在AR或ARMA模型[4]。 考慮到PACF和ACF,我們將使用AR模型。
#Plot ACFplt.bar(x=np.arange(0,41), height=acf(df.Close))
plt.title("ACF")Infinite and damps out so AR/ARMA.無(wú)限衰減,因此AR / ARMA。
Pre-processing the data
預(yù)處理數(shù)據(jù)
Before we run the model we must make sure we are using stationary data. Stationarity refers to a characteristic in which the way the data moves doesn’t change over time. Looking at the raw stock price seen earlier in the article, it is clear that the series is not stationary. We can see this as the stock price increases over time in a seemingly exponential manner.
在運(yùn)行模型之前,必須確保我們正在使用固定數(shù)據(jù) 。 平穩(wěn)性是指數(shù)據(jù)移動(dòng)方式不會(huì)隨時(shí)間變化的特征。 從文章前面看到的原始股票價(jià)格來(lái)看,顯然該系列不是固定的。 我們可以看到,隨著股價(jià)隨著時(shí)間的推移呈指數(shù)增長(zhǎng)。
Therefore, to make the series stationary we difference the series, which essentially means to subtract today’s value from tomorrow’s value. This results in the series revolving around a constant mean (0), giving us the stock returns instead of the stock price.
因此,為了使序列平穩(wěn),我們對(duì)序列進(jìn)行求差 ,這實(shí)質(zhì)上意味著從明天的值中減去今天的值。 這導(dǎo)致系列圍繞恒定均值(0)旋轉(zhuǎn),從而為我們提供了股票收益率而不是股票價(jià)格。
We are also going to lag the differenced series by 1, which brings yesterday’s value forward to today. This is so we can obtain our AR term (Yt-1).
我們還將差值序列滯后1,從而將昨天的值延續(xù)到今天。 這樣我們就可以獲得AR項(xiàng)(Yt-1)。
After putting these values into the same DataFrame, we split the data into training and testing sets. In the code, the data is split roughly into 80:20 respectively.
將這些值放入同一DataFrame之后,我們將數(shù)據(jù)分為訓(xùn)練和測(cè)試集。 在代碼中,數(shù)據(jù)分別大致分為80:20。
# Make the data stationary by differencingtsla = df.Close.diff().fillna(0)# Create lag
tsla_lag_1 = tsla.shift(1).fillna(0)# Put all into one DataFrame
df_regression = pd.DataFrame(tsla)
df_regression["Lag1"] = tsla_lag_1# Split into train and test data
df_regression_train = df_regression.iloc[0:200]
df_regression_test = df_regression.iloc[200:]tsla.plot()Differenced Series. Source: Yahoo Finance.差異系列。 資料來(lái)源:雅虎財(cái)經(jīng)。
Forming the AR model
形成AR模型
Now, how many values should we use to predict the next observation? Using all the past 200 values may not give a good estimate as intuitively, stock price activity from 200 days ago is unlikely to have a significant effect on today’s value as numerous factors may have changed since then. This could include earnings, competition, season and more. Therefore, to find the optimal window of observations to use in the regression, one method we can use is to run a regression with an expanding window. This method, detailed in the code below, runs a regression with one past observation, recording the r-squared value (goodness-of-fit), and then repeats this process, expanding past observations by 1 each time. For economic interpretation, I’ve set the limit on the size of the window at 30 days.
現(xiàn)在,我們應(yīng)該使用多少個(gè)值來(lái)預(yù)測(cè)下一次觀測(cè)? 從直覺(jué)上來(lái)說(shuō),使用過(guò)去200個(gè)值中的所有值可能無(wú)法給出一個(gè)很好的估計(jì),自200天前開(kāi)始的股價(jià)活動(dòng)不太可能對(duì)當(dāng)今的值產(chǎn)生重大影響,因?yàn)榇撕罂赡芤呀?jīng)發(fā)生了許多因素變化。 這可能包括收入,競(jìng)爭(zhēng),賽季等等。 因此,要找到在回歸分析中使用的最佳觀測(cè)窗口,我們可以使用的一種方法是使用擴(kuò)大的窗口進(jìn)行回歸。 下面的代碼中詳細(xì)介紹了該方法,該方法對(duì)一個(gè)過(guò)去的觀察值進(jìn)行回歸,記錄r平方值(擬合優(yōu)度),然后重復(fù)此過(guò)程, 每次將過(guò)去的觀察值擴(kuò)大1。 為了經(jīng)濟(jì)起見(jiàn),我將窗口大小的上限設(shè)置為30天。
# Run expanding window regression to find optimal windown = 0rsquared = []while n<=30:
y = df_regression_train["Close"].iloc[-n:]
x = df_regression_train["Lag1"].iloc[-n:]
x = sm.add_constant(x)model = sm.OLS(y,x)
results = model.fit()rsquared.append(results.rsquared)n +=1
Looking at the r-squared plot of each iteration, we can see than it is high around 1–5 iterations, and also has a peak at 13 past values. It may seem tempting to choose one of the values between 1 and 5, however, the very small sample size will likely mean that out regression is statistically biased, so wouldn’t give us the best result. Therefore let’s choose the second peak at 13 observations as this is a more sufficient sample size, which gives an r-squared of around 0.437 (i.e. model explains 43% of the variation in the data).
查看每次迭代的R平方圖,我們可以看到它在1-5次迭代附近較高,并且在13個(gè)過(guò)去的值處也有一個(gè)峰值。 從1到5之間選擇一個(gè)值似乎很誘人,但是,樣本量非常小可能意味著回歸回歸在統(tǒng)計(jì)上有偏差 ,因此不會(huì)給我們帶來(lái)最佳結(jié)果。 因此,讓我們選擇13個(gè)觀測(cè)值處的第二個(gè)峰,因?yàn)檫@是一個(gè)更充分的樣本量,其r平方約為0.437(即模型解釋了數(shù)據(jù)變化的43%)。
R-squared plot.R平方圖。Running the AR model on the training data
在訓(xùn)練數(shù)據(jù)上運(yùn)行AR模型
The next step is to use our window of 13 past observations to fit the AR(1) model. We may do this using the OLS function in statsmodels. Code below:
下一步是使用我們過(guò)去13次觀察的窗口來(lái)擬合AR(1)模型。 我們可以使用statsmodels中的OLS函數(shù)來(lái)執(zhí)行此操作。 代碼如下:
# AR(1) model with static coefficientsimport statsmodels.api as smy = df_regression_train["Close"].iloc[-13:]
x = df_regression_train["Lag1"].iloc[-13:]
x = sm.add_constant(x)model = sm.OLS(y,x)
results = model.fit()
results.summary()Regression output from the AR(1) model (training data).AR(1)模型的回歸輸出(訓(xùn)練數(shù)據(jù))。
As we can see in the statistical summary, the p-value of both the constant and the first lag is significant at the 10% significance level. Looking at the sign of the coefficients, the positive sign on the constant suggests that, all else being equal, stock price returns should be positive. Also, the negative sign on the first lag suggests that the past value of the stock return is lower than today’s value, ceteris paribus, which also maintains the narrative that stock returns increase over time.
正如我們?cè)诮y(tǒng)計(jì)摘要中看到的那樣,常數(shù)和第一個(gè)滯后的p值在10%的顯著性水平上都是顯著的。 從系數(shù)的符號(hào)來(lái)看,常數(shù)上的正號(hào)表示在所有其他條件相等的情況下, 股票價(jià)格收益應(yīng)該是正的。 同樣,第一次滯后的負(fù)號(hào)表明股票收益的過(guò)去值低于今天的價(jià)值,ceteris paribus,這也保持了股票收益隨時(shí)間增加的說(shuō)法。
Great, now let’s use those coefficients to find the fitted value for Tesla’s stock returns so we can plot the model against the original data. Our model may now be specified as:
太好了,現(xiàn)在讓我們使用這些系數(shù)來(lái)找到特斯拉股票收益的擬合值,以便可以將模型與原始數(shù)據(jù)作圖。 我們的模型現(xiàn)在可以指定為:
Our AR(1) equation.我們的AR(1)方程。Plot Residuals (Actual — Fitted)
剩余圖(實(shí)際-已擬合)
Residuals (training data)殘差(培訓(xùn)數(shù)據(jù))The residuals suggest that the model performs better in 2019, but in 2020 as volatility increased, the model performed considerable worse (residuals are larger). This is intuitive as the volatility experienced in the March 2020 selloff had a large impact on US stocks, while the quick and sizeable rebound was particularly felt by tech stocks. This, along with the increased betting on Tesla stock by retail traders on platforms such as Robinhood has increased price volatility, thus making it harder to predict.
殘差表明該模型在2019年的表現(xiàn)更好,但在2020年,隨著波動(dòng)性的增加,該模型的表現(xiàn)會(huì)更差(殘差更大)。 這是很直觀的,因?yàn)?020年3月拋售所經(jīng)歷的波動(dòng)性對(duì)美國(guó)股票產(chǎn)生了很大的影響,而科技股尤其感受到了快速而可觀的反彈。 這以及零售交易商在Robinhood等平臺(tái)上對(duì)特斯拉股票的押注增加,使得價(jià)格波動(dòng)性增加,因此很難預(yù)測(cè)。
Given these factors, along with our previous r-squared of around 43%, we would not expect our AR(1) model to predict the exact stock return. Instead, we can test the model’s accuracy by calculating its “hit rate”, i.e. when our model predicted a positive value and the actual value was also positive, and vice versa. Summing up instances of true positives and true negatives, the accuracy of our model comes out at around 55%, which is fairly good for this simple model.
考慮到這些因素,再加上我們之前的約43%的r平方,我們無(wú)法期望AR(1)模型能夠預(yù)測(cè)確切的股票收益。 相反,我們可以通過(guò)計(jì)算模型的“命中率”來(lái)測(cè)試模型的準(zhǔn)確性, 也就是說(shuō),當(dāng)模型預(yù)測(cè)為正值而實(shí)際值也為正時(shí) ,反之亦然。 總結(jié)真實(shí)肯定和真實(shí)否定的情況,我們模型的準(zhǔn)確性約為55%,對(duì)于這個(gè)簡(jiǎn)單的模型來(lái)說(shuō),這是相當(dāng)不錯(cuò)的。
Fit model to the test data
使模型適合測(cè)試數(shù)據(jù)
Now, let’s apply the same methodology to the test data to see how our model performs out-of-sample.
現(xiàn)在,讓我們將相同的方法應(yīng)用于測(cè)試數(shù)據(jù),以查看我們的模型如何執(zhí)行樣本外。
Actual vs Fitted (test data).實(shí)際值與擬合值(測(cè)試數(shù)據(jù))。 # Calculate hit ratetrue_neg_test = np.sum((df_2_test["Fitted Value"] <0) & (df_2_test["Actual"] <0))
true_pos_test = np.sum((df_2_test["Fitted Value"] >0) & (df_2_test["Actual"] >0))accuracy = (true_neg_test + true_pos_test)/len(df_2_test)
print(accuracy)# Output: 0.6415
Our hit rate has improved to 64% when applying the model to the test data, which is a promising improvement! Next steps to improve its accuracy may include running a rolling regression, where coefficients change with each iteration, or perhaps incorporating a moving average (MA) element to the model.
將模型應(yīng)用于測(cè)試數(shù)據(jù)時(shí),我們的命中率已提高到64%,這是一個(gè)有希望的改進(jìn)! 改善其準(zhǔn)確性的下一步可能包括運(yùn)行滾動(dòng)回歸,其中系數(shù)隨每次迭代而變化,或者可能將移動(dòng)平均(MA)元素合并到模型中。
Thanks for reading! Please feel free to leave any comments for any insights you may have. The full Jupyter Notebook which contains the source code I used to do this project can be found on my Github Repository.
謝謝閱讀! 如果您有任何見(jiàn)解,請(qǐng)隨時(shí)發(fā)表評(píng)論。 完整的Jupyter Notebook(包含我用于執(zhí)行此項(xiàng)目的源代碼)可以在我的 Github存儲(chǔ)庫(kù)中 找到 。
翻譯自: https://towardsdatascience.com/forecasting-teslas-stock-price-using-autoregression-52e7908d34b6
特斯拉自動(dòng)駕駛使用的技術(shù)
總結(jié)
以上是生活随笔為你收集整理的特斯拉自动驾驶使用的技术_使用自回归预测特斯拉股价的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 现代分层、聚集聚类算法_分层聚类:聚集性
- 下一篇: 熊猫分发_实用熊猫指南