-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A way to set recommended/reasonable value for concurrency #24
Comments
What's reasonable really depends on how you're using it. I don't think we can easily come up with something reasonable that works in all cases. Many use From experience, where a concurrency limit is important is for networking, file system, and spawning child processes. For most other things, See sindresorhus/cpy#69 (comment) for more. // @stroncium |
In the case of NYC p-map will be reading / writing JSON files and this will be happening in the main process before and after the actual test process is spawned so it won't be competing with tests for any resources. |
In general concurrency limits are important for any kind of resource, including cache space, memory bandwidth, memory space, processes/threads, file descriptors, IO devices bandwidth and space and probably some more categories. There is no way to fine tune it for all the cases. I am considering writing a lib for this, but at the moment I'm still struggling find a good way. Problems/ideas I'm working with atm are:
Another thing I'm really interested in is auto tuning concurrence at runtime depending on existing performance, which might be a way forward, as it doesn't rely on gathering metrics nor predicting which ones are important and can adopt to changing workload, but got it's own downsides(might need some time to catch up to environment specifics, especially if environment changes). As for working with files, the most effective optimizations are related to seek time in case of HDD/cache flush+fill time in case of SSD and number of descriptors. As a rule of thumb, if you're streaming big files, you want to work with 1-4 at a time on one device, if you're working with a lot of small files, most of the time you want to be at somewhere below 256 descriptors, in some cases up to 1024(and above that it's impossible to predict without knowing details). If underlying device is slow, you may be at peak performance even at 8-16, if it's fast then more in 64-128 range, but the gains from reducing concurrency too much usually are minor. |
I did a simplistic guess for // Educated guess, values of 16 to 64 seem to be optimal for modern SSD, 8-16 and 64-128 can be a bit slower.
// We should be ok as long as ssdCount <= cpuCount <= ssdCount*16.
// For slower media this is not as important and we rely on OS handling it for us.
const concurrency = os.cpus().length * 8; It was for |
@stroncium The library you mentioned sounds promising if it works well at the end :) (It's really complicated while have to make sure the overhead is very low, or it won't be worthy to be used...). Just wondering, auto-tuning isn't an uncommon term. Have you found any related library? Possibly, I can give some helps for the library if you would like to :D |
@yxliang01 I googled around a bit, but I think there is nothing around for such generic task except implementations of some algorithms(which isn't that hard once you picked algorithm anyway). If you have some libs in mind, let me know. As for writing my own, there are still a lot of unsolved questions, so for the moment I'm just considering various approaches and modelling them. |
@stroncium Right. Unfortunately, I'm not aware of such libraries at all. But, I have experience dealing with this kind of "dynamic schedulers" which are claimed to be making computers faster and more responsive(now I forgot the names already, anyways, I didn't feel much difference). |
Thanks for the responses (sorry for my slow reply). I've been trying to think of a way that async function processDirectory(directory) {
await pMap(
await fs.promises.readdir(directory),
filename => processFile(path.join(directory, filename)),
{scheduler}
);
}
await pMap(directories, processDirectory, {scheduler}); If the shared scheduler were configured to allow 8 concurrent tasks in theory they could all be exhausted by calls to processDirectory but processDirectory cannot succeed unless the inner pMap is allowed to run processFile. My idea is that the scheduler would internally decrease the max concurrency by one then allow each active pMap one free task. That way if 8 calls to processDirectory were active each one would still be allowed to run 1 call to processFile without counting against concurrency. I'm not sure if this could introduce edge cases where promises can never resolve. |
@coreyfarrell Making p-map scheduler would be one step forward than where our discussion was on. IIRC (it has been 3 months now), the problem is that we aren't aware of implementations of such scheduler. It seems hard to build one also. |
Love this library, but our team regularly experienced issues where we could accidentally overwhelm the database if we forgot to implement the concurrency option. We regularly have to specify If any future folks end up at this issue, our solution was to implement an eslint rule that enforced the usage of the options argument with the concurrency setting. I'll leave this here just in case!
|
Do you have a module that calculates and exports a reasonable default value for concurrency? For example what is done in sindresorhus/got#393. I'm planning to use p-map in some parts of nyc that can be async so I want to set a reasonable concurrency limit. I'd like to avoid calculating
os.cpus().length
multiple times and avoid duplicating the CI limiting logic.The text was updated successfully, but these errors were encountered: