Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of CPUs does not take into account #4341

Open
asfaltboy opened this issue Oct 8, 2023 · 6 comments
Open

Number of CPUs does not take into account #4341

asfaltboy opened this issue Oct 8, 2023 · 6 comments

Comments

@asfaltboy
Copy link

asfaltboy commented Oct 8, 2023

My company deploys Prisma on kubernetes (managed k8s, aka GKE, in Google cloud), and have observed an issue with the default conenction_limit being set by Prisma to be higher than expected, and also that this seems dependent on our underlying kubernetes node type.

We've noticed that the current method of using num_cpus::get_physical() when computing the default connection limit does not consider cgroups. I was wondering whether it's not better to use the num_cpus::get() method to detect the actual CPU cores available in the execution environment, it seems to take care of parsing available cgroup v1 and v2 .

As in the docs:

This will also check cgroups, frequently used in containers to constrain CPU usage.

We did notice that Prisma mention only "physical cores" in the docs, and I wonder if this is intentional.

For what it's worth, I imagine that using (already opened) connections from a DB client is probably more of an I/O heavy operation, and may not introduce significant CPU load on the physical host, at least in most common cases (perhaps some benchmark profiling could prove me wrong). That said, having the option to automatically default to a smaller pool size when a container constrained, can be more in-line with the expected behavior.

Refs:

@janpio
Copy link
Contributor

janpio commented Oct 8, 2023

have observed an issue with the default conenction_limit being set by Prisma to be higher than expected

Is the "higher than expected" limit causing any problems? Or is it just "higher than could have anticipated when understanding what kind of cores these are"?

@asfaltboy
Copy link
Author

Good point, I should have expanded on the impact, let me try:

In our k8s setup is that we run a bunch of small (/micro-) services in "pods" with limited resources on a smaller number of nodes (VMs). So for example an 8 core (8 logical, 4 physical btw) node can at any moment run 32 pods, each is allowed to use a small portion of CPU and memory as needed for the workload.

We start off by defaulting services to a relatively limited CPU / Memory resources (say 1 core at most), and we'd expect them to default to e.g 3 connections (however, they default to 9 = 4*2+1). We also start off with modest DB resources, such that the DB configuration sets max_connections to a small number. This misalignment causes us to need to adjust the pool/db configuration for many of the default workloads, as the pool size is too large for the default connection instance.

We can of course adjust the pool config in our default service, but I guess that's the point: the automatic value is meant to represent a sane default, which in our case will not be based on the workload but rather on the hosting node in the containerized environment.

I hope that makes sense, and I've not yammered your ears off :)

@janpio
Copy link
Contributor

janpio commented Oct 9, 2023

Ok, so you are basically saying that using logical instead of physical CPUs would be a more convenient default that will probably be more correct in more cases?

@asfaltboy
Copy link
Author

Thanks for summarizing it so well :) yes, that's exactly right

This is my understanding, at least. I'm by no means an expert, so I'll be happy to learn of other reasons this should not use logical cores.

@janpio
Copy link
Contributor

janpio commented Oct 13, 2023

That was most probably an oversight.

But to be honest, the number of cores - both logical or physical - does actually say very little about how many connections an application should be able to open. It is based on a very old misunderstanding of an even older rule about how many connections should be open on bare metal servers. But until we figure that out, it is probably better to at least use the logical cores.

Are you up to doing a PR?

@aqrln
Copy link
Member

aqrln commented Oct 18, 2023

FWIW you don't need an external crate for this in modern Rust, std::thread::available_parallelism is part of standard library now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants