-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Explorer: Use concurrent.futures background job queue to return get_column_profiles results asynchronously #4326
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Works as advertised.
Does Ark have this same issue? If not, I wonder if we should keep the old get_column_profiles
method and make this PR's implementation a new method. That way we don't need to change anything in Ark, and when we support concurrency in the Python kernel we'll use the original method too.
with self.lock: | ||
futures = list(self.pending_futures) | ||
|
||
for future in futures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also concurrent.futures.wait
(docs) if we need more control in future.
const callbackId = generateUuid(); | ||
const promise = new DeferredPromise<Array<ColumnProfileResult>>(); | ||
this._asyncTasks.set(callbackId, promise); | ||
await this._positronDataExplorerComm.getColumnProfiles(callbackId, profiles, this._profileFormatOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also catch errors here and reject the deferred promise immediately? It will timeout after 10 seconds in any case but could be helpful for the user to have immediate feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked here: #4559
Currently the expected behavior is to catch all error and return empty results instead. We will need additional UI to somehow display these errors.
src/vs/workbench/services/languageRuntime/common/languageRuntimeDataExplorerClient.ts
Outdated
Show resolved
Hide resolved
setTimeout(() => { | ||
// If the promise has already been resolved, do nothing. | ||
if (promise.isSettled) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to clean up _asyncTasks
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle the key should have been deleted from _asyncTasks
in the resolution of the promise
I’m not sure if this issue affects Ark or not. My understanding from what @lionel- told me is that they also want to serve requests in order in Ark also, so the asynchronous API here may be the preferred way. @seeM could you take charge of pushing this over the finish line (with whatever changes are necessary in Ark) since I’m OOO for the next couple weeks? |
"params": [ | ||
{ | ||
"name": "callback_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me a little itchy to have to complicate the contract b/c of the limitations of the backend, but I don't see a better way than what you've done here.
47d1883
to
9cc889a
Compare
c6f113a
to
d206ce3
Compare
extensions/positron-python/python_files/positron/positron_ipykernel/tests/test_data_explorer.py
Outdated
Show resolved
Hide resolved
7e504d4
to
6b5c206
Compare
Co-authored-by: Wasim Lorgat <[email protected]> Signed-off-by: Daniel Falbel <[email protected]>
) Fixes an issue while running viewing a small polars data frame. In this case, the background tasks that return the null cournts would be processed faster than the actual RPC message, leading to an issue where the `send_event` is sent before `send_result`, but with the message id from `send_result`, which cause positron to interpret the event and the return value of the RPC message. This PR makes sure `send_event` waits for for the RPC message to be handled before sending the event results. ``` import polars as pl from datetime import date df = pl.DataFrame( { "foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": [date(2020, 1, 2), date(2021, 3, 4), date(2022, 5, 6)], "a": [None, 2, 3], "b": [0.5, None, 2.5], "c": [True, None, False], } ) %view df ```
ef72f5a
to
913deb3
Compare
Thanks for pushing this through!! This is a big UX improvement in the data explorer |
As discussed in #4300, comm requests have to be served in the order they are received. Since the data explorer has some request types, like
get_column_profiles
that can grow very expensive for large datasets, this blocks "fast" requests likeget_data_values
from executing while these "slow" requests are pending.This change adds a
return_column_profiles
frontend method that allows these requests to be fulfilled in the background, allowing fast requests to be served unimpeded. This resolves the immediate performance issue that the data explorer is facing, so this pattern can be refined over time as we have need to do other asynchronous / background request handling in the kernels.Here's what the "blocked request" behavior looks like on the main branch:
Screencast.from.2024-08-08.14-55-18.mp4
And here is with this change:
Screencast.from.2024-08-12.13-13-10.mp4
The main difference is that the grid values load immediately, and then the null percentages load later when the
return_column_profiles
event is fired.This change can't be merged until the corresponding changes are implemented in Ark.