Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries not finishing when partition count on db increased. #88

Open
akshaysyaduvanshi opened this issue May 14, 2024 · 1 comment
Open

Comments

@akshaysyaduvanshi
Copy link

akshaysyaduvanshi commented May 14, 2024

Earlier Setup
32 partitions on DB
User Pool Resource Limits
CPU limit : 60
Queue Depth = 80
Connection version used : 4.1.6 aws marketplace singlestore connection.

I am using aws glue with 32 executors, for above setup and i could fetch 1 billion of records within 30 minutes.

Now one change we have done is changed the partitions count to 150 on database, now what i see is, some queries are running , some queries get queued, and running queries never finish.

Any idea on above , what could be causing this? Do all 150 queries need to execute in parallel?

@AdalbertMemSQL
Copy link
Collaborator

If you are using ReadFromAggregators parallel read feature - then yes. All reading tasks must start at the same time.
In the latest version, the connector tries to estimate how many resources the Spark cluster has and run several reading tasks inside of a single Spark task if needed. But generally, it is recommended to have enough big Spark cluster.

If you don't like to depend on number of database partitions in this way, you can use the ReadFromAggregatorsMaterialized (it will use more memory on the database side) feature or disable parallel read at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants