-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datastore: very slow queries, single record by key #4925
Comments
Hi @natasha-aleksandrova. It seems you're creating a new datastore client every time. This has quite a bit of overhead. Can you move the datastore client creation outside of your timing loop and see what you get? |
there is no loop in the above code. moving client creation before |
Hrm, on my end:
Can you tell me a bit more about your environment? |
certainly, could you tell what specifically you're looking for? btw 67ms query time to look up by key doesn't seem like good performance against comparable databases (pgsql, cassandra etc) |
above is running the script with multiple get queries after creating the client first. it seems pretty inconsistent, we are using Datastore with REST endpoints which create client then do a single query to return the data. It seems that the first query is always slow. Like I said earlier, ~50ms still seems a little slow. |
@dmcgrath could you let me know if this is within the expected latency for datastore? |
I am happy to hear that someone is raising this point. We experience the slowness with the Java client. First of all datastore is slower than cloud sql. Secondly there are fluctuations between 50ms and a couple of hundred ms . This forces us to use intermediate caches instead of querying the datastore directly. |
PM for Datastore here. Some quick insights.
Code:
Output:
|
Thank you for information.
Your results look much better than mine. What could account for such drastic difference?
Avg of ~20ms is much better and something we could work with (although coming from PostgreSQL with 1-2ms queries by the PK, it is still a little high).
Also, a little bit about our use case. We are working on REST endpoints and using Datastore to store our core entities like User, Driver, Company etc. They are fairly simple data structures, and most commonly our endpoints will do a single query by the Key to get the entity. So the code snippet I provided in the issue I reported is pretty much the basis of the code, init DS client, then do a get by Key.
Given our use case, would SQL be more suitable for us? Or anything we can do to optimize Datastore for us?
Thanks!
Natasha
|
I'm going to go ahead and close this, but by all means please continue discussing. If there's an actionable issue for this library, we can re-open or start a new issue. Thank you @dmcgrath for giving a thorough answers here. :) |
@natasha-aleksandrova -> I would recommend running the version of the test I posted with my results, then comparing the numbers. I strongly suspect you'll see they then match. |
These are results:
|
just to close on this sounds like:
|
@natasha-aleksandrova What i learned from using datastore, is to use google cloud sql if i don't need to store a huge amount of data and have a psychic amount of queries per second. google cloud sql is just awesome plus the fact that you are not locked in a propietary solution which you cannot change afterwards. Bonus: you can use cool third party libraries like sql alchemy. |
@david-gang thanks for sharing your insights! it is certainly helpful, and we are looking at trying out the cloud sql. |
Ubuntu 16.04.3 LTS
Python 3.6.3
google-cloud-datastore>=1.5.0
repro code:
We are just starting out with google cloud platform and chose datastore as the option to store data because it seems the simplest, however we are noticing very slow performance on queries. Most common query is by primary key (above code) or
=
on one of the fields.We are getting about ~200/300ms on the above query with a single entity or many entities in the DB. The queries are done either from a Compute Engine instance or Kubernetes clusters (both same perf).
We are not sure what to do from here, short of switching to a different DB option on the GCP platform. Any help would be greatly appreciated. Thanks!
The text was updated successfully, but these errors were encountered: