Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async callbacks for processing GetColumnProfile RPC requests #473

Merged
merged 14 commits into from
Sep 9, 2024

Conversation

dfalbel
Copy link
Contributor

@dfalbel dfalbel commented Aug 16, 2024

This PR pairs with posit-dev/positron#4326
And should be merged after #458

This makes GetColumnProfile requests asycronous giving the data explorer execution thread oportunities to compute other RPC methods, such as GetDataValues which should make the data explorer UX a little smoother.

@lionel- lionel- changed the base branch from main to feature/histograms-frequency-tables August 21, 2024 11:54
@lionel-
Copy link
Contributor

lionel- commented Aug 21, 2024

@dfalbel I changed the base to feature/histograms-frequency-tables but I still get unrelated diffs. I'll review nonetheless using this commit as a guideline since it seems that's the only relevant one here: Use async callbacks allowing other RPC methods to execute between get…

@dfalbel dfalbel force-pushed the feature/histograms-frequency-tables branch from 77117df to c09500e Compare August 21, 2024 12:57
@dfalbel
Copy link
Contributor Author

dfalbel commented Aug 21, 2024

@lionel- I rebased on the histograms branch and It's now correct.
But yes, 985a7d9 is the only relevant commit, the other commit is just fixing tests.

Copy link
Contributor

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice first stab!

I think we could make it simpler and safer by using the new idle task mechanism r_task::spawn_idle():

  • These tasks run on the R thread when R is idle instead of at interrupt time, which is the direction we'd like to take. It's also consistent with the data explorer being an idle-only feature (you can't use it when the kernel is busy on a shiny app or computing an expensive model).

  • These tasks take rust async functions, which make it easy to yield for a tick, fail on timeout etc.

  • They could own the callback_id and send the result back to the data explorer thread in the form of a response event. This way we won't need to spawn any threads or create any temporary channels. We just need a new permanent channel to transmit events to the data explorer thread, the receiving side of which would be handled by the data explorer event loop.

  • The tasks may get delayed by an incoming execute request. This means there's also a possibility of sending outdated summaries if the data was updated, but I guess the frontend should handle that case. There should be no possibility of piling up idle tasks while R is busy since the shell socket is busy as well and can't take any new data explorer requests.

crates/ark/src/data_explorer/async_channels.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/async_channels.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/async_channels.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/async_channels.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/r_data_explorer.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/r_data_explorer.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/r_data_explorer.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/async_channels.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/r_data_explorer.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/r_data_explorer.rs Outdated Show resolved Hide resolved
Base automatically changed from feature/histograms-frequency-tables to main August 27, 2024 15:31
@dfalbel dfalbel force-pushed the feature/async-callbacks branch 3 times, most recently from c7cca12 to 414a160 Compare August 29, 2024 17:56
@dfalbel dfalbel requested a review from lionel- August 30, 2024 14:32
@dfalbel
Copy link
Contributor Author

dfalbel commented Aug 30, 2024

Updated to use spawn_idle:

The most challending thing was that the task in spawn_idle can live longer than the data.frame we are currently visualizing, so the Rust compiler won't allows us to move that object to a different thread. It's also not safe to clone the table, in the rpc handler thread because it requires calling the R API. So similarly to the VDOCS table, we now store references to the currently visualized table in a global DATA_EXPLORER_TABLES DashMap and we created a wrapper Table object that abstracts away how the access to the RObject. It keep the requirements from RThreadSafe<> and can only be accessed from the main R thread.

Moved all profile handling functions to a separate file as they can no longer depend on self, since they are called from a thread that might outlive self.

I also had to change tests, because since there's no spawned thread in the tests, the tasks are now running synchronously and thus the messages get out of order.

Copy link
Contributor

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress! Some more changes needed as this is tricky.

crates/ark/src/data_explorer/column_profile.rs Outdated Show resolved Hide resolved
@@ -253,7 +253,7 @@ where
Fut: Future<Output = ()> + 'static,
{
// Idle tasks are always run from the read-console loop
if !only_idle && unsafe { R_TASK_BYPASS } {
if unsafe { R_TASK_BYPASS } {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmmm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:P

I forgot I had to change that to make tests pass. In theory this only affects tests because of R_TASK_BYPASS and all tests seem to be passing normally.

If I don't add this, then the tests get deadlock as there's no R main thread to actually watch the task queue and execute them.

crates/ark/src/data_explorer/r_data_explorer.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/column_profile.rs Outdated Show resolved Hide resolved
Comment on lines +48 to +49
let json_event = serde_json::to_value(event)?;
comm.outgoing_tx.send(CommMsg::Data(json_event))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused that we don't reply in case of errors, since there is a callback ID I would expect the frontend is keeping track of it. But I guess this is cleaned up when it no longer needs the answer because the data has changed? (I haven't looked into the frontend code yet)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, currently the RPC contract does not implement any error handling, so if it fails and we never respond, all the front-end will see is a timeout.

I added some handling here, so basically, when it fails we will return an empty result set:

.await
.unwrap_or_else(|e| {
// In case something goes wrong whhile computing the profiles, we send
// an empty response. Ideally, we would have a way to comunicate an that
// an error happened but it's not implemented yet.
log::error!("Error while producing profiles: {e}");
std::iter::repeat(empty_column_profile_result())
.take(n_profiles)
.collect()
});

We already return empty profiles if we fail to compute profiles, thus it seems reasonable to also do that if everything fails.

We can add error handling too, we would need make ReturnColumnsProfile an enum so we would return either the actual profiles or an error message, but I'd rather add this in another PR because it will also require changes in the python backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created posit-dev/positron#4559 to track this issue.

crates/ark/src/data_explorer/r_data_explorer.rs Outdated Show resolved Hide resolved
Comment on lines 17 to 20
// This allows for background threads to easilly access the
// instance related data without having to rely on the lifetime
// of data explorer execution thread.
// Since this is a DashMap, it's safe to access it's underlying data from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a dashmap stored in a global, could we use an Arc<Mutex<HashMap>>? This should solve the lifetime concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or passing around Arc<Mutex<Option<RThreadSafeObject<RObject>>>> as we discussed on Slack, so that an instance can set it to None when it's dropped, and getting a None would be a cancellation point for the task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked nice! So got rid of the global dashmap, and lifetime is now managed with Arc. Failing to get() from the table means that the table has been deleted and so it can be used to cancel tasks in case the data explorer is closed, although we don't really use it for now.

We would probably also want to implement a cancelling mechanism based on the callback_id, so the front-end can cancel tasks it will no longer need. But we can leave this for another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I meant cancelling the idle task, which does happen already via the propagated get() error, e.g. your comment

        // In case something goes wrong whhile computing the profiles, we send
        // an empty response. Ideally, we would have a way to comunicate an that
        // an error happened but it's not implemented yet.

@dfalbel dfalbel requested a review from lionel- August 30, 2024 22:42
Copy link
Contributor

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

crates/ark/src/data_explorer/column_profile.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/column_profile.rs Outdated Show resolved Hide resolved
crates/ark/src/data_explorer/column_profile.rs Outdated Show resolved Hide resolved
Co-authored-by: Lionel Henry <[email protected]>
@dfalbel dfalbel merged commit 9f864b3 into main Sep 9, 2024
3 checks passed
@dfalbel dfalbel deleted the feature/async-callbacks branch September 9, 2024 14:40
@github-actions github-actions bot locked and limited conversation to collaborators Sep 9, 2024
@dfalbel dfalbel restored the feature/async-callbacks branch September 10, 2024 10:20
@dfalbel dfalbel deleted the feature/async-callbacks branch September 10, 2024 14:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants