Async callbacks for processing GetColumnProfile RPC requests #473

dfalbel · 2024-08-16T00:14:31Z

This PR pairs with posit-dev/positron#4326
And should be merged after #458

This makes GetColumnProfile requests asycronous giving the data explorer execution thread oportunities to compute other RPC methods, such as GetDataValues which should make the data explorer UX a little smoother.

lionel- · 2024-08-21T12:10:01Z

@dfalbel I changed the base to feature/histograms-frequency-tables but I still get unrelated diffs. I'll review nonetheless using this commit as a guideline since it seems that's the only relevant one here: Use async callbacks allowing other RPC methods to execute between get…

dfalbel · 2024-08-21T13:05:01Z

@lionel- I rebased on the histograms branch and It's now correct.
But yes, 985a7d9 is the only relevant commit, the other commit is just fixing tests.

lionel-

Nice first stab!

I think we could make it simpler and safer by using the new idle task mechanism r_task::spawn_idle():

These tasks run on the R thread when R is idle instead of at interrupt time, which is the direction we'd like to take. It's also consistent with the data explorer being an idle-only feature (you can't use it when the kernel is busy on a shiny app or computing an expensive model).
These tasks take rust async functions, which make it easy to yield for a tick, fail on timeout etc.
They could own the callback_id and send the result back to the data explorer thread in the form of a response event. This way we won't need to spawn any threads or create any temporary channels. We just need a new permanent channel to transmit events to the data explorer thread, the receiving side of which would be handled by the data explorer event loop.
The tasks may get delayed by an incoming execute request. This means there's also a possibility of sending outdated summaries if the data was updated, but I guess the frontend should handle that case. There should be no possibility of piling up idle tasks while R is busy since the shell socket is busy as well and can't take any new data explorer requests.

crates/ark/src/data_explorer/async_channels.rs

crates/ark/src/data_explorer/r_data_explorer.rs

crates/ark/src/data_explorer/async_channels.rs

crates/ark/src/data_explorer/r_data_explorer.rs

dfalbel · 2024-08-30T15:34:10Z

Updated to use spawn_idle:

The most challending thing was that the task in spawn_idle can live longer than the data.frame we are currently visualizing, so the Rust compiler won't allows us to move that object to a different thread. It's also not safe to clone the table, in the rpc handler thread because it requires calling the R API. So similarly to the VDOCS table, we now store references to the currently visualized table in a global DATA_EXPLORER_TABLES DashMap and we created a wrapper Table object that abstracts away how the access to the RObject. It keep the requirements from RThreadSafe<> and can only be accessed from the main R thread.

Moved all profile handling functions to a separate file as they can no longer depend on self, since they are called from a thread that might outlive self.

I also had to change tests, because since there's no spawned thread in the tests, the tasks are now running synchronously and thus the messages get out of order.

lionel-

Good progress! Some more changes needed as this is tricky.

crates/ark/src/data_explorer/column_profile.rs

lionel- · 2024-08-30T14:52:45Z

crates/ark/src/r_task.rs

@@ -253,7 +253,7 @@ where
    Fut: Future<Output = ()> + 'static,
 {
    // Idle tasks are always run from the read-console loop
-    if !only_idle && unsafe { R_TASK_BYPASS } {
+    if unsafe { R_TASK_BYPASS } {


:P

I forgot I had to change that to make tests pass. In theory this only affects tests because of R_TASK_BYPASS and all tests seem to be passing normally.

If I don't add this, then the tests get deadlock as there's no R main thread to actually watch the task queue and execute them.

crates/ark/src/data_explorer/r_data_explorer.rs

crates/ark/src/data_explorer/column_profile.rs

lionel- · 2024-08-30T15:04:52Z

crates/ark/src/data_explorer/column_profile.rs

+    let json_event = serde_json::to_value(event)?;
+    comm.outgoing_tx.send(CommMsg::Data(json_event))?;


I'm still confused that we don't reply in case of errors, since there is a callback ID I would expect the frontend is keeping track of it. But I guess this is cleaned up when it no longer needs the answer because the data has changed? (I haven't looked into the frontend code yet)

Yeah, currently the RPC contract does not implement any error handling, so if it fails and we never respond, all the front-end will see is a timeout.

I added some handling here, so basically, when it fails we will return an empty result set:

ark/crates/ark/src/data_explorer/column_profile.rs

Lines 57 to 66 in 56726b7

.await

.unwrap_or_else(|e| {

// In case something goes wrong whhile computing the profiles, we send

// an empty response. Ideally, we would have a way to comunicate an that

// an error happened but it's not implemented yet.

log::error!("Error while producing profiles: {e}");

std::iter::repeat(empty_column_profile_result())

.take(n_profiles)

.collect()

});

We already return empty profiles if we fail to compute profiles, thus it seems reasonable to also do that if everything fails.

We can add error handling too, we would need make ReturnColumnsProfile an enum so we would return either the actual profiles or an error message, but I'd rather add this in another PR because it will also require changes in the python backend.

I created posit-dev/positron#4559 to track this issue.

crates/ark/src/data_explorer/r_data_explorer.rs

lionel- · 2024-08-30T15:25:43Z

crates/ark/src/data_explorer/table.rs

+// This allows for background threads to easilly access the
+// instance related data without having to rely on the lifetime
+// of data explorer execution thread.
+// Since this is a DashMap, it's safe to access it's underlying data from


Instead of a dashmap stored in a global, could we use an Arc<Mutex<HashMap>>? This should solve the lifetime concerns?

Or passing around Arc<Mutex<Option<RThreadSafeObject<RObject>>>> as we discussed on Slack, so that an instance can set it to None when it's dropped, and getting a None would be a cancellation point for the task.

This worked nice! So got rid of the global dashmap, and lifetime is now managed with Arc. Failing to get() from the table means that the table has been deleted and so it can be used to cancel tasks in case the data explorer is closed, although we don't really use it for now.

We would probably also want to implement a cancelling mechanism based on the callback_id, so the front-end can cancel tasks it will no longer need. But we can leave this for another PR.

oh I meant cancelling the idle task, which does happen already via the propagated get() error, e.g. your comment

// In case something goes wrong whhile computing the profiles, we send // an empty response. Ideally, we would have a way to comunicate an that // an error happened but it's not implemented yet.

crates/ark/src/data_explorer/column_profile.rs

lionel-

Looks good!

crates/ark/src/data_explorer/column_profile.rs

Co-authored-by: Lionel Henry <[email protected]>

lionel- changed the base branch from main to feature/histograms-frequency-tables August 21, 2024 11:54

dfalbel force-pushed the feature/histograms-frequency-tables branch from 77117df to c09500e Compare August 21, 2024 12:57

dfalbel force-pushed the feature/async-callbacks branch from 754119e to e46b104 Compare August 21, 2024 13:03

lionel- reviewed Aug 21, 2024

View reviewed changes

Base automatically changed from feature/histograms-frequency-tables to main August 27, 2024 15:31

dfalbel force-pushed the feature/async-callbacks branch 3 times, most recently from c7cca12 to 414a160 Compare August 29, 2024 17:56

dfalbel added 7 commits August 30, 2024 09:04

Use async callbacks RPC contracts

420fb2c

use dashmap instead

97eb9a2

Get to the point where we can move tables to background threads.

1cd4f68

Always block when testing

7a75a9a

No longer need these channels

f46f0b5

Use a single spawn idle

270e177

Some adjustments

cac8ace

dfalbel force-pushed the feature/async-callbacks branch from efaebcb to cac8ace Compare August 30, 2024 12:08

dfalbel added 3 commits August 30, 2024 10:19

Improve error handling

ae3eb3f

Allow yielding between profile types too

5d3b0d0

More improvements to error handling.

30f41f3

dfalbel requested a review from lionel- August 30, 2024 14:32

lionel- reviewed Aug 30, 2024

View reviewed changes

Address review comments

56726b7

dfalbel requested a review from lionel- August 30, 2024 22:42

Add some tracing messages

142f44d

dfalbel mentioned this pull request Sep 4, 2024

Turn histograms on by default #504

Merged

lionel- approved these changes Sep 9, 2024

View reviewed changes

Typos

33a4364

Co-authored-by: Lionel Henry <[email protected]>

Warn unreachable

000cffd

dfalbel merged commit 9f864b3 into main Sep 9, 2024
3 checks passed

dfalbel deleted the feature/async-callbacks branch September 9, 2024 14:40

github-actions bot locked and limited conversation to collaborators Sep 9, 2024

dfalbel restored the feature/async-callbacks branch September 10, 2024 10:20

dfalbel deleted the feature/async-callbacks branch September 10, 2024 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async callbacks for processing GetColumnProfile RPC requests #473

Async callbacks for processing GetColumnProfile RPC requests #473

dfalbel commented Aug 16, 2024

lionel- commented Aug 21, 2024

dfalbel commented Aug 21, 2024

lionel- left a comment

dfalbel commented Aug 30, 2024

lionel- left a comment

lionel- Aug 30, 2024

dfalbel Aug 30, 2024

lionel- Aug 30, 2024

dfalbel Aug 30, 2024

dfalbel Sep 3, 2024

lionel- Aug 30, 2024

lionel- Aug 30, 2024

dfalbel Aug 30, 2024

lionel- Sep 9, 2024

lionel- left a comment

		let json_event = serde_json::to_value(event)?;
		comm.outgoing_tx.send(CommMsg::Data(json_event))?;

	.await
	.unwrap_or_else(\|e\| {
	// In case something goes wrong whhile computing the profiles, we send
	// an empty response. Ideally, we would have a way to comunicate an that
	// an error happened but it's not implemented yet.
	log::error!("Error while producing profiles: {e}");
	std::iter::repeat(empty_column_profile_result())
	.take(n_profiles)
	.collect()
	});

Async callbacks for processing GetColumnProfile RPC requests #473

Async callbacks for processing GetColumnProfile RPC requests #473

Conversation

dfalbel commented Aug 16, 2024

lionel- commented Aug 21, 2024

dfalbel commented Aug 21, 2024

lionel- left a comment

Choose a reason for hiding this comment

dfalbel commented Aug 30, 2024

lionel- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionel- left a comment

Choose a reason for hiding this comment