Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tickstore query slowly #69

Closed
zoe0316 opened this issue Dec 24, 2015 · 5 comments
Closed

tickstore query slowly #69

zoe0316 opened this issue Dec 24, 2015 · 5 comments

Comments

@zoe0316
Copy link

zoe0316 commented Dec 24, 2015

Arctic said that can query millions of rows per second per client, but when I try to use it in our team, and found that it only thousand of rows per second, Here the code, Does anyone got the same problem or I use it with wrong way.

    @property
    def arctic(self):
        if not self._arctic:
            log.info("init arctic")
            mongo_conn = MongoDB()
            self._arctic = Arctic(mongo_host=mongo_conn.client)
            library = self._arctic.list_libraries()
            if self.tick_db not in library:
                self._arctic.initialize_library(self.tick_db, lib_type=arctic.TICK_STORE)
            if self.bar_db not in library:
                self._arctic.initialize_library(self.bar_db, lib_type=arctic.TICK_STORE)
        return self._arctic

...
# res is a dict of tick data
index = self.int_to_date(tick_time)
data = pd.DataFrame(res, [index])
self.arctic[self.tick_db].write(symbol, data)

...

>>> now = time.time(); ac['tick'].read('IF1601', date_range=dr); print(time.time() - now)
Output:
[4021 rows x 26 columns]
3.56284999847

thanks.

@femtotrader
Copy link
Contributor

For performance comparison with "pure" pymongodb see

In [234]: %time df_retrieved = pd.DataFrame(list(db.ticks.find()))
CPU times: user 39.6 s, sys: 27.1 s, total: 1min 6s
Wall time: 1min 21s

In [236]: df_retrieved
Out[236]:
             Ask      Bid   Spread  Volume                       _id
0        0.88922  0.88796  0.00126       1  567c324fcc9915206eb18cc8
1        0.88914  0.88805  0.00109       1  567c324fcc9915206eb18cc9
2        0.88910  0.88809  0.00101       1  567c324fcc9915206eb18cca
3        0.88908  0.88811  0.00097       1  567c324fcc9915206eb18ccb
4        0.88887  0.88808  0.00079       1  567c324fcc9915206eb18ccc
...          ...      ...      ...     ...                       ...
1913358  0.87589  0.87525  0.00064       1  567c32b1cc9915206ecebed6
1913359  0.87589  0.87527  0.00062       1  567c32b1cc9915206ecebed7
1913360  0.87588  0.87531  0.00057       1  567c32b1cc9915206ecebed8
1913361  0.87574  0.87531  0.00043       1  567c32b1cc9915206ecebed9
1913362  0.87574  0.87531  0.00043       1  567c32b1cc9915206ecebeda

[1913363 rows x 5 columns]

@cityhunterok
Copy link

we should use it store more ticks data in one record by pandas DataFrame , right?

@femtotrader
Copy link
Contributor

Let's use same file for benchmarking https://drive.google.com/file/d/0B8iUtWjZOTqla3ZZTC1FS0pkZXc/view?usp=sharing

see also pydata/pandas-datareader#153

I wonder if they (manahl Arctic dev team) shouldn't use Monary instead of pymongo
https://github.com/ksuarz/monary https://monary.readthedocs.org/

Read this https://pypi.python.org/pypi/Monary/0.4.0.post2

It is possible to get (much) more speed from the query if we bypass the PyMongo
driver. To demonstrate this, I've developed *monary*, a simple C library and
accompanying Python wrapper which make use of MongoDB C driver. 

see https://bitbucket.org/djcbeach/monary/issues/19/use-pandas-series-dataframe-and-panel-with

@jamesblackburn
Copy link
Contributor

I think there's quite a lot of overlap between what Monary does and Arctic.

Monary makes it fast to marshall primitive types (numpy int, floats, etc) into and out of MongoDB. We do something similar, except we do compression and batching on the client side. A lot of the win (in network and disk I/O terms) comes from financial data being highly compressible. Because we batch in the client, we end up performing few pymongo operations relative to the number of ticks/rows.

For profiling perhaps try: %prun in ipython

@zoe0316
Copy link
Author

zoe0316 commented Dec 31, 2015

Thanks for your comments. I have made a mistake, that I should not insert single row to Arctic but with batch way. Happy new year. XD

@zoe0316 zoe0316 closed this as completed Dec 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants