Use jupyter-collaboration to get the full notebook content for completions #708
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Users have repeatedly requested for the LLMs to consider the full notebook rather than only the current cell when generating completions, for example in jupyterlab/jupyterlab#15532.
However, sending the full notebook in each completion request would be slow, especially for large notebooks; completion are supposed to be nearly instant. Sending only the text of all cells might help a little (though not for the very long notebooks), but it would be restrictive (what if model needs the cell ID to generate a link? what if it needs cell outputs?) - future multi-modal AIs could use much more than just cell source (e.g. analysing the cell outputs or attachments).
When designing the inline completer implementation in
jupyter-ai
we included the file path. This helps a little because one could open the file on the disk to read the notebook. However, this has two limitations:This is a proof of concept for using the shared notebook model retrieved via
jupyter-collaboration
to populate the prefix/suffix with the content of the previous/following cells. Because RTC synchronises the notebook state in delta updates as changes happen, it would not add any latency and while it might be a few characters behind at times, this is not a problem because we only use it for previous/following cells which would have synchronised already, while using the text of the current cells as provided by frontend.This PR is not intended to be merged in the current form, but to serve as a reference for discussion.
In particular:
jupyter-collaboration
extensionYNotebook
tojupyter-collaboration
(or whatever the right package would be) which would offer a public API for gettingYNotebook
(and other documents) from ajupyter-server
extension (edit: removed implementation details questions, as this is now tracked in Public API to get a view of the shared document model jupyter-collaboration#270)DefaultInlineCompletionHandler
could hold a utility method for retrieving the shared notebook model, advanced users should be allowed to adjusted the way the notebook document gets used (how many cells get extracted, concatenated, etc), possibly by swapping theDefaultInlineCompletionHandler
- see Allow to swap theDefaultInlineCompletionHandler
#702