Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: TrueFX Tick DataReader #153

Open
femtotrader opened this issue Dec 22, 2015 · 3 comments
Open

ENH: TrueFX Tick DataReader #153

femtotrader opened this issue Dec 22, 2015 · 3 comments

Comments

@femtotrader
Copy link
Contributor

TrueFX http://www.truefx.com/ provides free tick data

It will be nice to add these data to DataReader

see PR #152

I'm still facing some issue such as very long time to process data.

Any help is welcome

@femtotrader
Copy link
Contributor Author

femtotrader commented Dec 24, 2015

I wonder if someone here knows why it's so long to process!

I try to read 1 month of tick data of AUD/USD
Sample data can be found here
https://drive.google.com/file/d/0B8iUtWjZOTqla3ZZTC1FS0pkZXc/view?usp=sharing

AUDUSD-2014-01.zip is 11M and contains AUDUSD-2014-01.csv which is 85M

which is not so big!

In [26]: %time df=pd.read_csv("AUDUSD-2014-01.csv", names=['Symbol', 'Date', 'Bid', 'Ask'])
CPU times: user 3.31 s, sys: 481 ms, total: 3.79 s
Wall time: 4.13 s

In [27]: df
Out[27]:
          Symbol                   Date      Bid      Ask
0        AUD/USD  20140101 21:55:34.404  0.88796  0.88922
1        AUD/USD  20140101 21:55:34.444  0.88805  0.88914
2        AUD/USD  20140101 21:55:34.475  0.88809  0.88910
3        AUD/USD  20140101 21:55:48.962  0.88811  0.88908
4        AUD/USD  20140101 21:56:38.293  0.88808  0.88887
...          ...                    ...      ...      ...
1947101  AUD/USD  20140131 21:59:48.048  0.87525  0.87589
1947102  AUD/USD  20140131 21:59:54.599  0.87527  0.87589
1947103  AUD/USD  20140131 21:59:56.927  0.87531  0.87588
1947104  AUD/USD  20140131 21:59:59.365  0.87531  0.87574
1947105  AUD/USD  20140131 22:00:00.038  0.87531  0.87574

[1947106 rows x 4 columns]

In [28]: %time df['Date'] = pd.to_datetime(df['Date'])
CPU times: user 6min 27s, sys: 3.46 s, total: 6min 30s
Wall time: 6min 39s

passing parse_dates to read_csv is even worst

In [13]: %time df=pd.read_csv("AUDUSD-2014-01.csv", names=['Symbol', 'Date', 'Bid', 'Ask'], parse_dates=['Date'])
CPU times: user 7min 54s, sys: 4.65 s, total: 7min 59s
Wall time: 8min 32s

This is odd because after data beeing load it's very quick to performs some calculations

In [48]: del df['Spread']

In [49]: %time df['Spread']=df['Ask']-df['Bid']
CPU times: user 17.9 ms, sys: 31.7 ms, total: 49.6 ms
Wall time: 28.1 ms

In [51]: %time df.resample(how='ohlc', rule='1D')
CPU times: user 67.2 ms, sys: 4.27 ms, total: 71.4 ms
Wall time: 70.5 ms
Out[51]:
                Bid                                 Ask                              Spread
               open     high      low    close     open     high      low    close     open     high low    close
Date
2014-01-01  0.88796  0.88979  0.88755  0.88928  0.88922  0.88997  0.88825  0.88942  0.00126  0.00126   0  0.00014
2014-01-02  0.88913  0.89434  0.88427  0.89050  0.88949  0.89441  0.88436  0.89057  0.00036  0.00044   0  0.00007
2014-01-03  0.89046  0.90043  0.88846  0.89465  0.89057  0.90053  0.88857  0.89468  0.00011  0.00049   0  0.00003
2014-01-04      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN NaN      NaN
2014-01-05      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN NaN      NaN
...             ...      ...      ...      ...      ...      ...      ...      ...      ...      ...  ..      ...
2014-01-27  0.86945  0.87595  0.86773  0.87307  0.86958  0.87598  0.86780  0.87316  0.00013  0.00102   0  0.00009
2014-01-28  0.87304  0.88205  0.87299  0.87853  0.87316  0.88209  0.87308  0.87853  0.00012  0.00162   0  0.00000
2014-01-29  0.87844  0.88257  0.87246  0.87464  0.87877  0.88269  0.87253  0.87473  0.00033  0.00273   0  0.00009
2014-01-30  0.87462  0.88044  0.87102  0.87955  0.87473  0.88050  0.87110  0.87962  0.00011  0.00047   0  0.00007
2014-01-31  0.87952  0.88232  0.86944  0.87531  0.87962  0.88239  0.86948  0.87574  0.00010  0.00081   0  0.00043

[31 rows x 12 columns]

@femtotrader
Copy link
Contributor Author

df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d %H:%M:%S.%f')

speedups conversion but the whole process of downloading, reading, converting is still so slow!!!

Here is results with 2 months of ticks data for AUDUSD (using nose-timer https://github.com/mahmoudimus/nose-timer )

$ nosetests -s -v pandas_datareader/tests/test_truefx.py --with-timer
test_filename_csv (pandas_datareader.tests.test_truefx.TestTrueFX) ... ok (0.1024s)
test_get_truefx_datareader (pandas_datareader.tests.test_truefx.TestTrueFX) ... ok (186.2332s)
test_url (pandas_datareader.tests.test_truefx.TestTrueFX) ... ok (0.0111s)

pandas_datareader.tests.test_truefx.TestTrueFX.test_get_truefx_datareader: 186.2332s
pandas_datareader.tests.test_truefx.TestTrueFX.test_filename_csv: 0.1024s
pandas_datareader.tests.test_truefx.TestTrueFX.test_url: 0.0111s

when data have been downloaded previously and stored to SQLite cache

$ nosetests -s -v pandas_datareader/tests/test_truefx.py --with-timer
test_filename_csv (pandas_datareader.tests.test_truefx.TestTrueFX) ... ok (0.0963s)
test_get_truefx_datareader (pandas_datareader.tests.test_truefx.TestTrueFX) ... ok (60.3628s)
test_url (pandas_datareader.tests.test_truefx.TestTrueFX) ... ok (0.0091s)

pandas_datareader.tests.test_truefx.TestTrueFX.test_get_truefx_datareader: 60.3628s
pandas_datareader.tests.test_truefx.TestTrueFX.test_filename_csv: 0.0963s
pandas_datareader.tests.test_truefx.TestTrueFX.test_url: 0.0091s

@femtotrader femtotrader changed the title TrueFX tick DataReader Tick DataReader Feb 29, 2016
@femtotrader femtotrader changed the title Tick DataReader ENH: Tick DataReader Mar 1, 2016
@femtotrader femtotrader changed the title ENH: Tick DataReader ENH: TrueFX Tick DataReader Sep 8, 2016
@femtotrader
Copy link
Contributor Author

Direct requests to URLs like

http://www.truefx.com/dev/data/2014/JANUARY-2014/AUDUSD-2014-01.zip

now redirects to

http://www.truefx.com/

Users need to be registered

I see (at least) 2 solutions

  • manually download file(s)
  • have an API key to allow this (I've just send a message to their support team)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants