Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use socketfile to improve perf #441

Merged
merged 5 commits into from
Apr 9, 2022

Conversation

alonme
Copy link
Contributor

@alonme alonme commented Mar 31, 2022

@CLAassistant
Copy link

CLAassistant commented Mar 31, 2022

CLA assistant check
All committers have signed the CLA.

@alonme
Copy link
Contributor Author

alonme commented Mar 31, 2022

inspired by @alippai suggestion in #398.

i see a 30%~ performance boost on queries

@alonme
Copy link
Contributor Author

alonme commented Apr 2, 2022

The benchmark i used - running on the vertica-ce docker, and using the Vmart database.

I switched between the new read_bytes_using_file implementation to the current read_bytes in the source code, and autoreload reloads the module before running each line.

In this example i see a 50% performance improvement

In [1]: # Use auto-reload to switch between `read_bytes` and `read_bytes_using_file`
   ...: %load_ext autoreload
   ...: %autoreload 2

In [2]: import vertica_python
   ...:
   ...: conn_info = {
   ...:     "host": "127.0.0.1",
   ...:     "port": 5433,
   ...:     "user": "dbadmin",
   ...:     #'password': 'some_password',
   ...:     "database": "VMart",
   ...:     "session_label": "some_label",
   ...:     "unicode_error": "strict",
   ...:     "ssl": False,
   ...:     "autocommit": True,
   ...:     "use_prepared_statements": False,
   ...:     "connection_timeout": 5,
   ...: }
   ...:
   ...: query = """
   ...: SELECT sales_quantity, sales_dollar_amount, transaction_type
   ...: FROM online_sales.online_sales_fact
   ...: LIMIT 100000
   ...: """
   ...:
   ...: def run_query():
   ...:     with vertica_python.connect(**conn_info) as conn:
   ...:         cur = conn.cursor()
   ...:         cur.execute(query)
   ...:         cur.fetchall()
   ...:

In [3]: # using read_bytes_using_file

In [4]: %timeit run_query()
960 ms ± 6.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: # Using read_bytes

In [6]: %timeit run_query()
1.95 s ± 7.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@alonme
Copy link
Contributor Author

alonme commented Apr 2, 2022

@sitingren - Hey, do you need any additional information regarding this change?
My last commits removed the previous implementation which i left in the code to make it easy to benchmark.

@ni-todo-spot
Copy link

Wow! Impressive work 🥇
Our company would definitely benefit from such improvement!

@sitingren sitingren self-requested a review April 6, 2022 07:37
vertica_python/vertica/connection.py Outdated Show resolved Hide resolved
vertica_python/vertica/connection.py Outdated Show resolved Hide resolved
vertica_python/vertica/connection.py Outdated Show resolved Hide resolved
vertica_python/vertica/connection.py Outdated Show resolved Hide resolved
@alonme
Copy link
Contributor Author

alonme commented Apr 7, 2022

@sitingren - Thanks for the review,
I believe i fixed / answered all the issues.

@alonme alonme requested a review from sitingren April 7, 2022 15:57
@sitingren sitingren merged commit 3572f00 into vertica:master Apr 9, 2022
@alonme
Copy link
Contributor Author

alonme commented Apr 9, 2022

Thanks @sitingren !

When can we expect a release?

@ni-todo-spot
Copy link

Thanks @sitingren !

When can we expect a release?

Very exciting news guys :)
We'd love to use this feature!

@sitingren
Copy link
Member

This goes into release v1.0.5.

Copy link
Contributor

@alippai alippai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buf += data might not be needed at all. The read will return exactly the amount needed. As far as I remember this interface is different from the original read as it returns full results only

edit: both cpython and pypy return a io.BufferedReader which will handle this for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants