Skip to content

SunHao95/PHBS_TQFML-StockIndex-Wavelet-Transformation-ARIMA-ML-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PHBS_TQFML-Project

Methodology

  • Stock index data generally has much noise and is non-stationary, which is a huge challenge for us using ML(Machine Learning) methods to predict the index. However wavelet transformation, an upgraded version of fourier transformation, can serve as a very good filter to decrease the noise in stock index and smooth the data, thus helping us to focus more on the main trend of stock index.

    Figure 1. Filter Bank Scheme for DWT

    In the figure, H,L,and H’,L’ are the high-pass and low-pass filters for wavelet decomposition and reconstruction respectively. In the decomposition phase, the low-pass filter removes the higher frequency components of the signal and highpass filter picks up the remaining parts. Then, the filtered signals are downsampled by two and the results are called approximation coefficients and detail coefficients. The reconstruction is just a reversed process of the decomposition and for perfect reconstruction filter banks, we have x = x'. A signal can be further decomposed by cascade algorithm as shown in following equation:

Figure 2. Wavelet Decomposition Tree
  • Prediction Model.

  • ARMA-ML(Autoregressive Moving Average and Machine Learning) Model

After wavelet transformation, there are two types of stock index data, low-frequency and high-frequency. The ARMA-ML model is trying to using ARMA method to predict the high-frequency data,the detail coefficients, since high-frequency is stationary. While ML methods, such as SVR(Support Vector Regression) and GBR(Gradient Boosting Regression),are trying to predict the low-frequency data, the approximation coefficients. Finally, using the predicted data together to reconstruct the stock index. Generally speaking, ARMA-ML model is trying to complete prediction on the timing series perspective.

Finding appropriate values of p and q in the ARMA(p,q) model can be facilitated by plotting the partial autocorrelation functions for an estimate of p, and likewise using the autocorrelation functions for an estimate of q. Further information can be gleaned by considering the same functions for the residuals of a model fitted with an initial selection of p and q. Brockwell & Davis recommend using AICc for finding p and q

Support vector regression (SVR) is a version of SVM for regression. The model produced by support vector classification (as described above) depends only on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin. Analogously, the model produced by SVR depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

Data & Prediction

The datas selected are the daily stock index data of 000300.SH representing the large-cap stocks and 000905.SH representing medium-and-small-cap stocks, including,

  • Open: Open daily price
  • High: Highest daily price
  • Low: Lowest daily price
  • Close: Close daily price
  • Volume: Trading volume
  • AMT: Trading amount
  • Time range: 2010-01-01 to 2018-03-30

Use the former 4 days' close price to predict the next day's close price. Using 150-day rolling windown to make prediction. Finally, try to make a prediction of 30-day close price.

Result

  • Wavelet Transformation

Figure 3. Approximation&Detail Components of Wavelet Decomposition
  • ARMA

Figure 4. ARMA Fit
  • GBR Prediction

Figure 5. GBR Prediction
  • SVR Prediction

Figure 6. SVR Prediction
  • SVR_GBR Prediction

Figure 7. SVR_GBR Prediction Prediction
  • Evaluation

Use common regression matrices(explained_variance, mean_absolute_error, mean_squared_error, r2_score)to evaluate the results.

Model ev mae mse r2
GBR_Model 0.084507 30.393337 1426.833774 0.046767
SVR_Model -0.246318 51.662584 4424.650770 -0.658574
GBR_SVR_Model -0.272351 31.929540 1403.899401 -0.441158

Motivation & References

Stock index, as time series, inspires a lot of research to implement the forecast both in academic area and financial departments. Generally speaking, the main methods used to do prediction are time-series analysis and machine learning models. Some of the research reports and papers have presented good ideas to predict stock index by means of combined_models, such as TS & ML models. Some even use some data processing methods like Wavelet Transformation to make the data properties more suitable to different predictin models. All the reference papers and research reports have been uploaded in the reference folder.

Conclusion

Unfortunately, it seems that none of the model has good prediction power, because the ev and r2 are so small and even negative, which indicate that stock prices cannot be predicted exactly! However, the "noisy" data processing methods and time-series analysis model as well as nonlinear machine learning regression model can serve as some useful tools to do further research in other fields.

  • GBR prediction seems as the lag of previous stock prices, just predicting like a martingale.
  • SVR performs badly in the begining of stock index prediction. As time goes by, it tends to predict the average(or expectation) price.
  • The mix GBR/SVR model is just the simple mean of GBR and SVR. Its performance lies between GBR and SVR

About

PHBS 2018 Machine Learning Class Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published