Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize Binance futures orderbook data #27

Closed
quantitative-technologies opened this issue Jul 24, 2023 · 23 comments
Closed

Normalize Binance futures orderbook data #27

quantitative-technologies opened this issue Jul 24, 2023 · 23 comments

Comments

@quantitative-technologies

I wanted to take advantage of the freely available historical futures orderbook level 2 data from binance.

It should be possible by combining this with historical trade data (also available from binance I believe) to obtain normalized data for htfbacktest.

But I couldn't find this in the repo examples. I wanted to check if it has already been done, so I don't waste time redoing it?

@nkaz001
Copy link
Owner

nkaz001 commented Jul 25, 2023

There isn't. If you provide an example file for me to look into its format, I would add an example converter.

@nkaz001
Copy link
Owner

nkaz001 commented Jul 25, 2023

By the way, without a local timestamp indicating when you received the feed, accurate backtesting is not possible, as there is no feed latency information. While you can artificially generate a local timestamp by assuming feed latency, it is preferable to collect the data yourself for more reliable results.

@quantitative-technologies
Copy link
Author

There isn't. If you provide an example file for me to look into its format, I would add an example converter.

Here is example LOB data for a single day: https://drive.google.com/file/d/1rVaDblmYJL0aPpgvdJ-fU9QFhMDga6f_/view?usp=sharing

Btw, I was also happy to write it, but wanted to make sure I wasn't "reinventing the wheel".

@quantitative-technologies
Copy link
Author

Yes, good point about the local timestamp. Thanks for the tip.

The artificial local timestamps are fine for my purposes at the moment.

@nkaz001
Copy link
Owner

nkaz001 commented Jul 26, 2023

trade data is also required. still it's possible to backtest only based on depth data. it's meaningless especially in high freq. backtesting.

@quantitative-technologies
Copy link
Author

Right. I was not suggesting trying to use OB data alone. Actually, I found your repo while looking for an implementation for inventory models, which of course need trade data to fit them.

The trade data is available from the Binance Public Data:

wget https://data.binance.vision/data/futures/um/daily/trades/BTCUSDT/BTCUSDT-trades-2020-07-01.zip

Here is the trade data corresponding to the above depth data.

@nkaz001
Copy link
Owner

nkaz001 commented Jul 27, 2023

I added the converter. hftbacktest/data/utils/binancehistmktdata.py (a5d3f91)

could you check if it works as expected? again, in my experience, I have observed that backtest results can exhibit significant discrepancies unless precise feed latency and order latency are used.

@quantitative-technologies
Copy link
Author

Excellent!

My plan was to look into the inventory MM model (as you gave an example of). I will report it if anything unexpected shows up.

I think you mean significant discrepancies between backtest and live trading results, but I am not doing any live trading at the moment. If you want to me to try out one of your other examples with the binance historical data, please let me know.

@nkaz001 nkaz001 closed this as completed Jul 28, 2023
@quantitative-technologies
Copy link
Author

I am getting an error using the following trade data, for ETHUSDT on 2022-10-03, as in your example notebook.

I think it is because the first row contains the column names, unlike the previous example. My guess is that the format has changed with newer data.

@nkaz001
Copy link
Owner

nkaz001 commented Aug 10, 2023

Thanks for the report. Please see the latest commit. 740feee

@quantitative-technologies
Copy link
Author

Thanks for updating the code.

Now I can successfully run the data preparation notebook.

However, when I use the prepared data from binance in the Guéant–Lehalle–Fernandez-Tapia Market Making Model and Grid Trading notebook, it is off by a factor of about 2 in trading intensity from your calculated results. For example:
hftbacktest_fit_2023-08-15_14-49-17

It's as if there are only half as many trades in the data files obtained from binance. To be safe, I added a 10ms feed latency, but as expected that does not affect the fitted model parameters.

Note that I had to adjust for the fact that the binance data is timestamped to milliseconds rather than microseconds.

Would it be possible to share your collected data for ETHUSDT futures on 2022-10-03 (e.g. on Google drive)? That way people could reproduce your results, and also I could directly compare the trade data to binance.

@nkaz001
Copy link
Owner

nkaz001 commented Aug 18, 2023

For your information, I used trade stream instead of aggTrade stream which is currently officially documented but aggregated.

@quantitative-technologies
Copy link
Author

I'm not sure I understand, since I also used trade data from binance, rather than aggTrade. In fact, your converter does not even work on the binance historical aggTrade data, though I don't see a need for it.

Unless you are suggesting that the trade data from binance is in fact still aggregated?

Anyhow, my plan is to collect my own data from the stream and then I can compare with the historical data from binance.

@nkaz001
Copy link
Owner

nkaz001 commented Aug 22, 2023

No. But trade stream functions as expected, just like its description in the official spot API document, even though it is not outlined in the official futures API documents. So I guess Binance's historical data also came from aggTrade. Comparison is the most effective way for figuring things out.

@quantitative-technologies
Copy link
Author

Another issue showed up: I was working with more recent data, and it has an additional undocumented field trans_id. This changes the offset of the other fields, and breaks the converter.

Here is an example of the recent snapshot data: https://drive.google.com/file/d/1y-9nt9V-eB_OV3uSq4-dzBe-eOsQDt4S/view?usp=sharing

nkaz001 added a commit that referenced this issue Aug 28, 2023
@nkaz001
Copy link
Owner

nkaz001 commented Aug 28, 2023

See 2b3137c and let me know if it works as expected.

@quantitative-technologies
Copy link
Author

Code looks much better now without hard-coded indices, and it processes the snapshot fine.

But now it fails on the convert function call in the validation step with an exception.

Here is the lob data and trade data to reproduce this.

@nkaz001
Copy link
Owner

nkaz001 commented Aug 30, 2023

See 7299d9a. I fixed the mingled timestamp issue but since the data hasn't local timestamp, there is no way but sorting. That can cause another discrepancy. Beware of that.

@quantitative-technologies
Copy link
Author

Thanks! I tested it out and there were no more errors.

I'm not sure exactly what discrepancy you mean, but perhaps it will become more clear as I continue working on it.

@nkaz001
Copy link
Owner

nkaz001 commented Sep 3, 2023

What I meant by that is that any difference from the live trading environment can cause a discrepancy.

@phybrain
Copy link

phybrain commented Dec 5, 2023

Thanks for updating the code.

Now I can successfully run the data preparation notebook.

However, when I use the prepared data from binance in the Guéant–Lehalle–Fernandez-Tapia Market Making Model and Grid Trading notebook, it is off by a factor of about 2 in trading intensity from your calculated results. For example: hftbacktest_fit_2023-08-15_14-49-17

It's as if there are only half as many trades in the data files obtained from binance. To be safe, I added a 10ms feed latency, but as expected that does not affect the fitted model parameters.

Note that I had to adjust for the fact that the binance data is timestamped to milliseconds rather than microseconds.

Would it be possible to share your collected data for ETHUSDT futures on 2022-10-03 (e.g. on Google drive)? That way people could reproduce your results, and also I could directly compare the trade data to binance.

Could you provide the code of Guéant–Lehalle–Fernandez-Tapia Market Making Model? :)

@nkaz001
Copy link
Owner

nkaz001 commented Dec 5, 2023

you can find it on tutorials page or examples directory.

@phybrain
Copy link

phybrain commented Dec 6, 2023

you can find it on tutorials page or examples directory.

thanks

nkaz001 added a commit that referenced this issue Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants