机器学习该如何应用到量化投资系列(二)
前言
深度學(xué)習(xí)技術(shù)在交易中的研究
深度學(xué)習(xí)最近受到了很多關(guān)注,特別是在圖像分類和語音識別領(lǐng)域。然而,它的應(yīng)用似乎并沒有廣泛應(yīng)用到交易當(dāng)中。這項(xiàng)調(diào)查涵蓋了到目前為止作者(Greg Harris)發(fā)現(xiàn)相關(guān)的系統(tǒng)交易。(點(diǎn)擊閱讀原文獲取原文PDF)
一些名詞:
DBN = Deep BeliefNetwork(深度信念網(wǎng)絡(luò))
LSTM = LongShort-Term Memory(長短期記憶),一種時(shí)間遞歸神經(jīng)網(wǎng)絡(luò)
MLP = Multi-layer Perceptron(多層神經(jīng)網(wǎng)絡(luò))
RBM = RestrictedBoltzmann Machine(限制玻爾茲曼機(jī))
ReLU = RectifiedLinear Units(修正線性單元),激活函數(shù)
CNN =Convolutional Neural Network(卷積神經(jīng)網(wǎng)絡(luò))
Limit OrderBook模型
Sirignano(2016)預(yù)測了limit order books的變化。他設(shè)計(jì)了一個(gè)可以利用局部空間結(jié)構(gòu)的“空間神經(jīng)網(wǎng)絡(luò)”,他設(shè)計(jì)的網(wǎng)絡(luò)可作為分類器而且比一般的神經(jīng)網(wǎng)絡(luò)計(jì)算效率更高。他建立模型以求出下一個(gè)狀態(tài)的最佳買價(jià)、賣價(jià)的聯(lián)合分布情況。同時(shí),也能求出其中之一(買/賣價(jià))的改變對另外一個(gè)的影響。
Architecture – Each neural network has 4 layers. The standard neuralnetwork has 250 neurons per hidden layer, and the spatial neural network has50. He uses the tanh activation function on the hidden layer neurons.
Training – He trained and tested on order books from 489 stocks from 2014 to 2015(a separate model for each stock). He uses Level III limit order book data fromthe NASDAQ with event times having nanosecond decimal precision. Traininginvolved 50TB of data and used a cluster with 50 GPUs. He includes 200features: the price and size of the limit order book across the first 50non-zero bid and ask levels. He uses dropout to prevent overfitting. He usesbatch normalization between each hidden layer to prevent internal covariateshift. Training is done with the RMSProp algorithm. RMSProp is similar tostochastic gradient descent with momentum but it normalizes the gradient by arunning average of the past gradients. He uses an adaptive learning rate wherethe learning rate is decreased by a constant factor whenever the training errorincreases over a training epoch. He uses early stopping imposed via avalidation set to reduce overfitting. He also includes an l^2 penalty whentraining in order to reduce overfitting.
Results – He shows that limit order books exhibit some degree of local spatialstructure. He predicts the order book 1 second ahead and also at the time ofthe next bid/ask change. The spatial neural network outperforms the standardneural network and logistic regression with non-linear features. Both neuralnetworks have 10% lower error than logistic regression.
基于價(jià)格的分類模型
Dixon(etal.)(2016)使用了一個(gè)深度神經(jīng)網(wǎng)絡(luò)去預(yù)測未來5分鐘的價(jià)格變化的信號,曾在43種大宗商品和外匯期貨中使用。
Architecture – Their input layer has 9,896 neurons for inputfeatures made up of lagged price differences and co-movements betweencontracts. There are 5 learned fully-connected layers. The first of the fourhidden layers contains 1,000 neurons, and each subsequent layer tapers by 100neurons. The output layer has 135 neurons (3 for each class {-1, 0, 1} times 43contracts).
Training – They used the standard back-propagation with stochastic gradientdescent. They speed up training by using mini-batching (computing the gradienton several training examples at once rather than individual examples). Ratherthan an nVidia GPU, they used an Intel Xeon Phi co-processor.
Results – They report 42% accuracy, overall, for three-class classification.They do some walk-forward training instead of a traditional backtest. Theirboxplot shows some generally positive Sharpe ratios from the mini-backtests foreach contract. They did not include transaction costs or crossing the bid-askspread. All their predictions and features were based on the mid-price at theend of each 5-minute time period.
Takkeuchi andLee(2013)研究了動量效應(yīng)對預(yù)測股票月收益率的影響。
Architecture – They use an auto-encoder composed of stacked RBMs toextract features from stock prices which they then pass to a feed-forwardneural network classifier. Each RBM consists of one layer of visible units andone layer of hidden units connected by symmetric links. The first layer has 33units for input features from one stock at a time. For every month t, thefeatures include the 12 monthly returns for month t-2 through t-13 and the 20daily returns approximately corresponding to month t. They normalize each ofthe return features by calculating the z-score relative to the cross-section ofall stocks for each month or day. The number of hidden units in the final layerof the encoder is sharply reduced, forcing dimensionality reduction. The outputlayer has 2 units, corresponding to whether the stock ended up above or belowthe median return for the month. Final layer sizes are 33-40-4-50-2.
Training – During pre-training, they split the dataset into smaller,non-overlapping mini-batches. Afterwards, they un-roll the RBMs to form anencoder-decoder, which is fine-tuned using back-propagation. They consider allstocks trading on the NYSE, AMEX, or NASDAQ with a price greater than $5. Theytrain on data from 1965 to 1989 (848,000 stock-month samples) and test on datafrom 1990 to 2009 (924,300 stock-month samples). Some training data held-outfor validation for the number of layers and the number of units per layer.
Results – Their overall accuracy is around 53%. When they consider thedifference between the top decile and the bottom decile predictions, they get3.35% per month, or 45.93% annualized return.
Batres-Estrada(2015)預(yù)測了在給定的交易日中哪些股票會有高于中位數(shù)的回報(bào)(基于標(biāo)準(zhǔn)普爾500)。他的研究對Takeuchi和Lee(2013)的研究也產(chǎn)生了影響。
Architecture – He uses a 3-layer DBN coupled to an MLP. He uses 400neurons in each hidden layer, and he uses a sigmoid activation function. Theoutput layer is a softmax layer with two output neurons for binaryclassification (above median or below). The DBN is composed of stacked RBMs,each trained sequentially.
Training – He first pre-trains the DBN module, then fine-tunes the entire DBN-MLPusing back-propagation. The input includes 33 features: monthly log-returns formonths t-2 to t-13, 20 daily log-returns for each stock at month t, and anindicator variable for the January effect. The features are normalized usingthe Z-score for each time period. He uses S&P 500 constituent data from1985 to 2006 with a 70-15-15 split for training-validataion-test. He uses thevalidation data to choose the number of layers, the number of neurons, and theregularization parameters. He uses early-stopping to prevent over-fitting.
Results – His model has 53% accuracy, which outperforms regularized logisticregression and a few MLP baselines.
Sharang andRao(2015)使用了DBN(深度信念網(wǎng)絡(luò))訓(xùn)練的技術(shù)指標(biāo)對投資組合進(jìn)行分類。
Architecture – They use a DBN consisting of 2 stacked RBMs. Thefirst RBM is Gaussian-Bernoulli (15 nodes), and the second RBM is Bernoulli (20nodes). The DBN produces latent features which they try feeding into threedifferent classifiers: regularized logistic regression, support vectormachines, and a neural network with 2 hidden layers. They predict 1 ifportfolio goes up over 5 days, and -1 otherwise.
Training – They train the DBN using a contrastive divergence algorithm. Theycalculate signals based on open, high, low, close, open interest, and volumedata, beginning in 1985, with some points removed during the 2008 financialcrisis. They use 20 features: the “daily trend” calculated over different time frames, and thennormalized. All parameters are chosen using a validation dataset. When trainingthe neural net classifier, they mention using a momentum parameter duringmini-batch gradient descent training to shrink the coefficients by half duringevery update.
Results – The portfolio is constructed using PCA to be neutral to the firstprincipal component. The portfolio is an artificial spread of instruments, soactually trading it is done with a spread between the ZF and ZN contracts. Allinput prices are mid-prices, meaning the bid-ask spread is ignored. The resultslook profitable, with all three classification models performing 5-10% moreaccurately than a random predictor.
Zhu(et al.)(2016)使用了基于深度信念網(wǎng)絡(luò)的箱體震蕩理論來進(jìn)行決策。箱體震蕩理論認(rèn)為股票的價(jià)格會在一個(gè)確定的范圍內(nèi)(箱體)震蕩,如果價(jià)格超出這個(gè)范圍,那么股票價(jià)格會完全進(jìn)入一個(gè)新的箱體。他們的交易策略就是在突破箱體頂部時(shí)買入和在跌穿箱體底部時(shí)賣出。
Architecture – They use a DBN made up of stacked RBMs and a finalback-propagation layer.
Training – They used block Gibbs sampling to greedily train each layer fromlowest to highest in an unsupervised way. They then train the back-propagationlayer in a supervised way, which fine-tunes the whole model. They chose 400stocks out of the S&P 500 for testing, and the test set covers 400 daysfrom 2004 to 2005. They use open, high, low, close prices as well as technicalanalysis indicators, for a total of 14 model inputs. Some indicators are givenmore influence in the prediction through the use of “gray relation analysis” or “gray correlation degree.”
Results – In their trading strategy, they charge 0.5% transaction costs pertrade and add a couple of parameters for stop-loss and “transaction rate.” I don’t fully understand the result tables, but they seem tobe reporting significant profits.
波動率預(yù)測
Xiong (etal.)(2015)根據(jù)估算出來的開、高、低、收價(jià)格預(yù)測了標(biāo)準(zhǔn)普爾500指數(shù)的日波動率。
Architecture – They use a single LSTM hidden layer consisting of oneLSTM block. For inputs they use daily S&P 500 returns and volatilities.They also include 25 domestic Google trends, covering sectors and major areasof the economy.
Training – They used the “Adam” method with 32 samples per batch and meanabsolute percent error (MAPE) as the objective loss function. They set themaximum lag of the LSTM to include 10 successive observations.
Results – They show their LSTM method outperforms GARCH, Ridge, and LASSOtechniques.
波基于文本的分類模型
R?nnqvist andSarlin(2016)使用新聞文章來預(yù)測銀行的運(yùn)營狀況。具體來說,他們建立了一個(gè)分類器用來判斷一個(gè)句子表示的是處于困難時(shí)期還是平穩(wěn)時(shí)期。
Architecture – They use two neural networks in this paper. The firstis for semantic pre-training to reduce dimensionality. For this, they run asliding window over text, taking a sequence of 5 words and learning to predictthe next word. They use a feed-forward topology where a projection layer in themiddle provides the semantic vectors once the connection weights have beenlearned. They also include the sentence ID as an input to the model, to providecontext and inform the prediction of the next word. They use binary Huffmancoding to map sentence IDs and word to activation patterns in the input layer,which organizes the words roughly by frequency. They say feed-forwardtopologies with fixed context sizes are more efficient than recurrent neuralnetworks for modeling text sequences. The second neural network is forclassification. Instead of a million inputs (one for each word), they use 600inputs from the learned semantic model. The first layer has 600 nodes, themiddle layer has 50 rectified linear hidden nodes, and the output layer has 2nodes (distress/tranquil).
Training – They train it with 243 distress events over 101 banks observed duringthe financial crisis of 2007-2009. They use 716k sentences mentioning thebanks, taken from 6.6m Reuters news articles published during and after thecrisis.
Results – They evaluate their classification model using a custom “Usefulness” measure. The evaluation is done usingcross-validation, leaving N banks out in each fold. They aggregate the distresscounts into various timeseries but don’t go so far as to consider creating a tradingstrategy.
Fehrer andFeuerriegel(2015)訓(xùn)練了一個(gè)基于新聞標(biāo)題的模型用來預(yù)測德國的股票收益。
Architecture – They use a recursive autoencoder with an additionalsoftmax layer in each autoencoder for estimating probabilities. They performthree-class prediction {-1, 0, 1} for the following day’s return of the stock associated with theheadline.
Training – They initialize the weights with Gaussian noise, and then updatethrough back-propagation. They use an English ad-hoc news announcement dataset(8,359 headlines) for the German market covering 2004 to 2011. Results – Their recursive autoencoder has 56% accuracy, which in an improvementover a more traditional random forest modeling approach with 53% accuracy. Theydo not develop a trading strategy. They have made a Java implementation oftheir code publicly available.
Ding (etal.)(2015)使用從新聞標(biāo)題中提取出來的結(jié)構(gòu)化信息來預(yù)測標(biāo)準(zhǔn)普爾500指數(shù)的變化。他們用OPEN IE(Open information Extraction,不是打開IE=.=)來處理新聞標(biāo)題,并獲得新聞事件所表達(dá)的信息(人,事,物,時(shí))。與其他普通的網(wǎng)絡(luò)不同的是,他們使用了張量神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)語義組合。
Architecture – They combine short-term and long-term effects ofevents, using a CNN to perform semantic composition over the input eventsequence. They use a max pooling layer on top of the convolutional layer, whichmakes the network retain only the most useful features produced by theconvolutional layer. They have separate convolutional layers for long-termevents and mid-term events. Both of these layers, along with an input layer forshort-term events, feed into a hidden layer which then feeds into two outputnodes.
Training – They extracted 10 million events from Reuters and Bloomberg news. Fortraining, they corrupt events by replacing one event argument with a randomargument. During training, they assume that the actual event should be given ahigher score than the corrupted event. When it isn’t, model parameters get updated.
Results – They find that structured events are better features than words forstock market prediction. Their approach outperforms baseline methods by 6%.They make predictions for the S&P 500 index and 15 individual stocks, and atable appears to show that they can predict the S&P 500 with 65% accuracy.
投資組合模型
Heaton (etal.)(2016)試圖尋找一個(gè)比生物科技指數(shù)IBB表現(xiàn)更好的投資組合。他們有目標(biāo)地跟蹤指數(shù)和一些股票,并嘗試在大幅下跌的情況下仍然能跑贏指數(shù)。他們使用支持非線性結(jié)構(gòu)的擬合模型,而不是直接對協(xié)方差矩陣建模。
Architecture – They use auto-encoding with regularization and ReLUs.Their auto-encoder has one hidden layer with 5 neurons.
Training – They use weekly return data for the component stocks of IBB from 2012to 2016. They auto-encode all stocks in the index and evaluate the differencebetween each stock and its auto-encoded version. They keep the 10 most “communal” stocks that are most similar to the auto-encodedversion. They also keep a varying number of other stocks, where the number ischosen with cross-validation.
Results – They show the tracking error as a function of the number stocksincluded in the portfolio, but don’t seem to compare against traditional methods. Theyalso replace index drawdowns with positive returns and find portolios thattrack this modified index.
轉(zhuǎn)載于:https://www.cnblogs.com/alan-blog-TsingHua/p/9951777.html
總結(jié)
以上是生活随笔為你收集整理的机器学习该如何应用到量化投资系列(二)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Maven安装与配置详解(Win10)
- 下一篇: for,while循环