Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements using Worker Threads #15813

Closed
DonJayamanne opened this issue Mar 30, 2021 · 7 comments
Closed

Performance Improvements using Worker Threads #15813

DonJayamanne opened this issue Mar 30, 2021 · 7 comments
Assignees
Labels
area-editor-* User-facing catch-all bug Issue identified by VS Code Team member as probable bug meta Issue that is tracking an overall project

Comments

@DonJayamanne
Copy link

I've been looking into a performance issue related to opening notebook.
TLDR - Opening a notebook in widows for Chris Dias and me takes 20-30 seconds (Jupyter extension is not doing anything).
If we disable the Python extension, it opens almost immediately (~5 seconds).

I think the delay is in Python extension spawning n processes. I know we're making changes to environment discovery.
However I think we can improve this even further (or, even with the current changes, it might not be sufficient).

I believe the problem lies with the spawning of processes, if we spawn proceses, it takes a few ms to just spawn the process, e.g. i know on Windows, spawning a conda process takes a few seconds, & that's blocking (spawning processes in node is synchronous) - but communication is async.

This blocks the main thread, hence blocks all other extensions (kinda explains why it takes so many seconds to just open a notebook when Jupyter isn't doing anything).

Suggestion - Why not spawn the processes in a worker thread (completely non-blocking)
I did look into this last year, & we can do this today.

What can we move to these worker threads:
Spawning processes just to get their environment information
Spawning processes to run code e.g. python -c "import ipykernel"
& other simple processes

@DonJayamanne DonJayamanne added bug Issue identified by VS Code Team member as probable bug triage-needed Needs assignment to the proper sub-team labels Mar 30, 2021
@karthiknadig
Copy link
Member

@DonJayamanne we should turn this into a meta issue an create work items for each particular case we move into worker. This is great if we can do this where ever possible.

@karthiknadig karthiknadig self-assigned this Mar 30, 2021
@karthiknadig karthiknadig added triage area-editor-* User-facing catch-all meta Issue that is tracking an overall project and removed triage-needed Needs assignment to the proper sub-team labels Mar 30, 2021
@DonJayamanne
Copy link
Author

FYI - We have managed to resolve the performance issue in Jupyter notebooks by delaying the loading of the Python extension. Now Jupyter Notebooks open in around 3 seconds, basically the Python extension gets loaded/activated AFTER the notebook is opened.
Hence this isn't a priority anymore, but if we had this, we'd ensure the Jupyter extension doesn't regress & we don't need to worry about other extensions blocking us. (or this extenison not blocking others)

@karthiknadig karthiknadig removed their assignment Apr 8, 2021
@karthiknadig karthiknadig added needs proposal Need to make some design decisions and removed triage labels Apr 8, 2021
@karrtikr
Copy link

karrtikr commented Aug 4, 2023

@Tyriar
Copy link
Member

Tyriar commented Sep 15, 2023

Looking briefly at how the locators work I believe you could make this much faster if you performed this and other similar functions in parallel:

public async *iterEnvs(query?: PythonLocatorQuery): IPythonEnvsIterator<BasicEnvInfo> {
await this.activate();
const iterator = this.doIterEnvs(query);
if (query?.envPath) {
let result = await iterator.next();
while (!result.done) {
const currEnv = result.value;
const { path } = getEnvPath(currEnv.executablePath, currEnv.envPath);
if (arePathsSame(path, query.envPath)) {
yield currEnv;
break;
}
result = await iterator.next();
}
} else {
yield* iterator;
}
}

I may just not be understanding things but from what I hear, environment discovery can take 20 seconds which seems like an outrageous amount of time to me. Have we done a perf trace to see where time is actually being spent when the environment activation is happening?

I believe the problem lies with the spawning of processes, if we spawn proceses, it takes a few ms to just spawn the process, e.g. i know on Windows, spawning a conda process takes a few seconds, & that's blocking (spawning processes in node is synchronous) - but communication is async.

Processes should block main for milliseconds, spawning a single conda process certainly should not be blocking the extension host for seconds.

@DonJayamanne
Copy link
Author

Processes should block main for milliseconds, spawning a single conda process certainly should not be blocking the extension host for seconds.

yes that’s correct
conda takes 20 seconds but that doesn’t block node, I should have been more clear
I found using worker threads to be an improvement, at least that’s when I prototyped the code

@Tyriar
Copy link
Member

Tyriar commented Sep 16, 2023

I had a chat with @karrtikr yesterday and it looks like there's some pretty nice gains to be made by making locators run more in parallel.

@karrtikr
Copy link

Closing in favor of #22146 (comment).

@karrtikr karrtikr closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2023
@github-actions github-actions bot removed the needs proposal Need to make some design decisions label Nov 15, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-editor-* User-facing catch-all bug Issue identified by VS Code Team member as probable bug meta Issue that is tracking an overall project
Projects
None yet
Development

No branches or pull requests

4 participants