Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pending_requests #227

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

krassowski
Copy link
Collaborator

A part of a proposed solution to jupyter-server/jupyter_server#990

Supersedes #197

Copy link
Collaborator

@davidbrochart davidbrochart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand better, could you describe what you intend to write in a pending request?

"pending_requests": YArray[
    YMap[
        "id": str,
        "type": str
    ]
]

jupyter_ydoc/ynotebook.py Outdated Show resolved Hide resolved
@davidbrochart davidbrochart added the enhancement New feature or request label Apr 11, 2024
Co-authored-by: David Brochart <[email protected]>
@krassowski
Copy link
Collaborator Author

When user executes a long-running cell twice it could be:

{
   "pending_requests": [
      {
        # msg_id of an `shell.execute_request` (allowing to identify subsequent "iopub.status" by parent_header
        "id": "3ec6186d-dbd2-486e-9622-e771b4bdcc21",
        "type": "execute_request"
      },
      {
        # if user scheduled two executions of the same cell we will need to wait until both are cleared
        "id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
        "type": "execute_request"
      }
   ]
}

When user runs a cell with input() it could be:

{
   "pending_requests": [
      {
        # the request for execution of a cell with `input()`
        "id": "6fd7fbd4-1c1d-4824-9c96-a57972b4be62",
        "type": "execute_request"
      },
      {
        # the response from the server
        "id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
        "type": "input_request"
      }
   ]
}

Here is how an execution request and a subsequent iopub.status response looks on the wire:

image

And an execution followed by input_request - note no status reply yet because the input_request is pending:

image

Here are some thoughts:

  • do we want to prefix these properties with msg_ to align with msg_id and msg_type?
  • do we want to also store session ID from the header? In practice incoming messages would need to be compared against pending_requests and if msg_id is only unique within a given session then maybe we should store the session too

@davidbrochart
Copy link
Collaborator

I'm not sure we should go down the road of allowing concurrent cell execution. Aren't executions queued anyway?

@krassowski
Copy link
Collaborator Author

This is not about concurrent executions but queuing. We need to record the queue to resolve the status properly.

@davidbrochart
Copy link
Collaborator

In #197 there is an execution_state field which is equivalent to the status, why isn't that enough?

@krassowski
Copy link
Collaborator Author

To illustrate if I have a cell:

from time import sleep
sleep(60)

I execute it first time, the pending_requests would change to

[
  {
    "id": "3ec6186d-dbd2-486e-9622-e771b4bdcc21",
    "type": "execute_request"
  }
]

then while it is still running, say 5 seconds in, I execute it again; there are now two pending requests:

[
  {
    "id": "3ec6186d-dbd2-486e-9622-e771b4bdcc21",
    "type": "execute_request"
  },
  {
    "id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
    "type": "execute_request"
  }
]

after another 55 seconds the first cell would finish and emit "idle" status; however, the cell should still display the "busy" status as there will be another 60 seconds until the second execution request completes and in the meantime the pending_requests will be:

[
  {
    "id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
    "type": "execute_request"
  }
]

@davidbrochart
Copy link
Collaborator

But if we don't process new execution requests until the cell's execution state is idle, we don't need to record pending executions?

@krassowski
Copy link
Collaborator Author

In #197 there is an execution_state field which is equivalent to the status, why isn't that enough?

Good question. There are at least two use cases which cannot be resolved with simple execution_state field alone, unless we move all execution logic from the frontend to the server:

  • recording timing
  • stdin input boxes

Currently, frontend records cell execution timing to cell metadata. If we only keep "status" flag there is no way for frontend to pick up the new message after reloading. Say I execute a cell twice with sleep(60) and then reload the page in 30th second. If all I get is "busy" status from the model, then when the "iopub.status" message with "idle" comes at 60th second I cannot reconcile it with the cell metadata. I can still stop the timer at 120th second because the model will then switch to "idle", but the timer will show execution time of 120 seconds (rather than 60 seconds).

Similarly for input boxes, the frontend needs to know if the input request is still pending. If it does not know, it can get into a kernel deadlock state.

Now, the current proposal is not ideal for the timing scenario either because if I run a cell with sleep(60) and close the browser tab, and then reopen after 70 seconds, then the frontend cannot set the elapsed time properly - all that it can infer is that the cell is no longer running (because pending_requests would be empty).

So maybe we should populate the cell execution timing metadata on the server. For reference, the timing code is here in JupyterLab. However, note, this is not only about setting the right timings, but also about notifying the extensions via a signal. Maybe it would be fine to populate metadata on server and in the scenario described earlier (a single cell executed twice with browser tab refreshed in the middle of the first execution) the frontend would watch to changes to that metadata field, and emit the signal respectively.

But if we don't process new execution requests until the cell's execution state is idle, we don't need to record pending executions?

See above, but also I don't think we have any control on when the kernel processes the execution requests. We could do as you say (don't process next until previous returned idle), but I think that the messaging protocol says that queuing is handled by the kernel and it is its implementation detail; changing it to what you propose should not be a problem unless there is a substantial latency between server and kernel so maybe for remote kernels.

@davidbrochart
Copy link
Collaborator

I think this underlines the need to move to server-side execution.
With these changes, I feel that the kernel protocol is leaking into the shared model, but with server-side execution the kernel protocol is exactly what we want to get rid of (in frontends).
I agree that timings should be recorded server-side.
For input requests, my vision is that they should be collaborative anyway, maybe using ypywidgets. That way, every client will see the input box with live changes from other peers.
I agree that what I'm proposing needs a lot of changes in different parts, but I think this should be the direction. And I fear that if we don't take that direction we are going to end up in a mixed state where the CRDT-based architecture will be "polluted" by low-level details of the non-CRDT system that we are precisely trying to get rid of.

@krassowski
Copy link
Collaborator Author

For input requests, my vision is that they should be collaborative anyway, maybe using ypywidgets. That way, every client will see the input box with live changes from other peers.

I can see that for plain text input, but password boxes would need special treatment (from getpass import getpass; getpass())?

@davidbrochart
Copy link
Collaborator

davidbrochart commented Apr 11, 2024

We could imagine a special widget that would not show the password, but would still allow any client to enter it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants