AnyTrading is a collection of OpenAI Gym environments for reinforcement learning-based trading algorithms.
Based on original gym-anytrading environment (you can see that at https://github.com/AminHP/gym-anytrading), there are lots of modifications done to improve the original environment.
In our environment, TradingEnv is an abstract environment which is defined to support all kinds of trading environments. StocksEnv, inheriting and extending TradingEnv, backtests the trading data of Google stock from 2009 to 2018.
The original design of gym-anytrading is quite simple, which aims at making the agent learn in a faster and more efficient way. However, we find that many defects of the original environment make it difficult to train agents, and the incomplete original environment is difficult to describe the real trading environment. Therefore, lots of modifications have been done. In the several following subsections, I will explain why these modifications are meaningful.
We use a state machine to describe how the TradingEnv interact with agent as well as how an agent make profits.
As shown below, the state machine use three kinds of trading positions and five (action "Hold" does not shown) kinds of trading actions to describe how the transaction goes over time.
Short: If the current env is in Short state, it means that the agent borrowed stocks from the securities companies.
Flat: If the current env is in Flat state, it means that the agent does not hold shares.
Long: If the current env is in Long state, it means that the agent has changed all the funds into stocks.
Double_Sell: means agent want sell all the stocks it holds as well as the stocks it borrows from securities companies.
Sell: means sell the stocks agent holds.
Hold: maintain current status.
Buy: means buy the stocks at current close price.
Double_Buy: means return shares to securities companies and exchange all the funds on hand for stocks at current close price.
If profit or loss occurs, it means that one of the following two cycles in state machine has occurred.
- buying long
- Flat -> Long -> Flat
- short selling
- Flat -> Short -> Flat
According to the above definition, we can easily know that the formula of accumulative profit is:
Comparing the objective function (
- buying long:
$log(close_{curr} / close_{pre})+log(cost)$
- short selling:
$log(2 - close_{curr} / close_{pre})+log(cost)$
- otherwise:
- 0
so that maximize $\mathbb{E}{\tau} \sum r$ is equivalent to maximize $\mathbb{E}{\tau}[\prod_{buying\ long}(r_{curr}/r_{pre}\ *\ cost) + \prod_{short\ selling}((2-r_{curr}/r_{pre})\ *\ cost)]$
The experimental results show that such a definition is better than the original gym-anytrading accumulated reward function :$\sum(r_{curr} - r_{pre})$.
As you see, you can use render
method to plot the position and profit at one episode.
- The position figure:
- The x-axis of the position figure is trading days. In this case, it is 252 trading days.
- The y-axis of the position figure is the closing price of each day.
- Besides, the red inverted triangle, the green positive triangle and the blue circle represent the position of the agent every trading day respectively.
- The profit figure:
- Similarly, The x-axis of the profit figure is trading days. In this case, it is 252 trading days. (a pair of pictures keep the same time interval)
- The y-axis of the profit figure is the profit of each day. 1.5 means the rate of return is 150%.