-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor improvements in the IPython and Jupyter Notebook workflows #1075
Comments
(Comment copied over, originally written by @lorenabalan ) I like this! [...] I have just 2 comments:
|
(Comment copied over, originally written by @AntonyMilneQB) I'm a bit out of the loop here so might have missed some things, but as a frequent Jupyter (ex-?)user myself this is something that I'm very interested in so am going to throw in my opinion anyway... 😬 General reaction is that this sounds super awesome and like a huge improvement 🎉 😀 👍 Jupyter-kedro viz integration
@lorenabalan Not speaking for Ivan here since I haven't discussed it with him, but for a long time I've thought that editing node code through Jupyter in kedro viz would be a killer feature. At the moment a very common workflow (myself included) is:
A much better version of this would be:
A while ago @limdauto had a couple of examples of how this could work in practice (Jupyter accessing node code in kedro viz). A couple of comments for @idanov
|
To @lorenabalan:
It is polluting the environment indeed, but unfortunately I couldn't find an "environment aware" IPython configuration, without having to change the command starting IPython. And unfortunately when Jupyter starts IPython, we have very little control on how it starts IPython, what environment variables it adds or command lines to IPython. Your suggestion to have a counter-command to
Sure, will provide some mockup video recordings in a following comment. To @AntonyMilneQB, For your second point about the usefulness of editing a node, in order to make this possible for Jupyter, we should make it possible for IPython first, because Jupyter is in a way just a frontend to IPython sessions. As for authoring pipelines in Kedro Viz, I think we will eventually get there, but it will require quite a bit of time investment. The suggestions here are meant to be low-hanging fruits or at least things that will not break the current flow too much. |
Here's what I meant by creating a custom pipeline-explorer-viz.movAnd here's what we could have if we create a custom pipeline-explorer-jupyter.movAs you can see, instead of folders, we can show pipeline namespaces, and instead of files, we can show node names and directly edit them. Making this work will get us very close to enabling the same directly in Kedro Viz, which will be a very nice addition and make Data Science workflows much easier than what we have currently. |
This looks really nice! I will take a closer look at it later, for now I just have some comments about the current startup script for ipython (and notebooks). Will these issues be solved with the refactorization? The first problem I have had is that if my catalog file contains an invalid dataset specifier and I start the ipython terminal every thing looks fine (except that the kedro specific variables are not specified in the help text). Trying to use e.g. The same thing happens if I have defined my own datasets and there is an exception raised in the dataset code (e.g. a missing import which is not installed). An even more confusing thing that I experienced was that the file |
I'm encouraged by the direction of this work! We can create a seamless Kedro/Jupyter notebook workflow from start to supporting our users' debugging workflow. I have left some comments. I also have a particular question that I will ask upfront: "How do we get around polluting users' iPython environments with Kedro?" Starting a Kedro-instance of a Jupyter notebook with
|
As a way to progress forward on this one, we should look into the following steps:
Those command provide very little to the user anyway, since they are wrappers around calling Instead of having those commands, we should make sure that loading the Kedro extension is the only widely known alternative, as well as provide a very small number of steps for this to happen. So a set of other tasks need to be completed:
Some of those changes will be breaking changes and probably worth to try implementing them for Kedro 0.18 (to be discussed, since that might require us to add deprecation warnings in a small 0.17.8 release which is not ideal). Me and @AntonyMilneQB will turn those steps into issues and put them on our backlog and once we complete them all, we'll revisit this discussion and see how we can build on that to provide even better Jupyter experience. |
Thanks for writing all this up Ivan! Very excited by where this is going. Just a few comments:
Not sure I agree with this. According to the telemetry data,
I do think that exposing
I think this is too strong a statement and not quite what I meant. For a start, the
Not sure about this - as per slack, I don't much like the idea of releasing 0.17.8 just before 0.18.0 just for the purpose of adding some deprecation warnings. One of the main motivations here is "there's only one way to work with jupyter/ipython" I know. But given that the only breaking changes are removing some commands I don't see the need to actually do that so long as their functionality is identical to our new workflow. Here's what I'd propose:
A couple of questions to check my understanding:
|
@AntonyMilneQB Totally makes sense, we can postpone the removal of those commands for 0.19 (if we still think that's needed at that time) and then the only breaking change left is the To your questions:
|
I was asked to add a few things here for ideas: If a pipeline or node is running in a Notebook (we use databricks so I don't want to specify Jupyter specifically) mode, it would be fantastic if in addition to the normally specified output of a node/pipeline run (defined by the configuration files), that the last node run also retains a memory dataset which can be used for debugging. Additionally Kedro Viz needs to be edited to work with Partial functions. Right now we cannot use Kedro Viz for a lot of projects because we have various needs for Partial functions. The partial function breaks kedro viz from rendering anything. |
@WolVecz thanks for the comments. On the kedro-viz issue, I think this may actually now be fixed in kedro-org/kedro-viz#692 (not released yet). |
The conversation here is old and long, but I see there are a few pending tasks in #1075 (comment), plus some ideas on how to add rich integrations of Kedro for Jupyter. For the former, do we want to evaluate any of that for 0.19 @merelcht ? And for the latter, should we consider opening separate issues or browsing the existing ones to see if they capture these ideas? |
@astrojuanlu I think we can close this issue. We've done a lot of work here already and have some more specific issues open about debugging the |
Thanks, closing this as Done then! |
Context
From our experience in supporting our users as well as from simply reading our guide on the integration with IPython and Jupyter, we know that there are a number of challenges for users to work with Kedro from notebooks.
.ipython/
folder in our projects makes our templates more cluttered and incomprehensible.ipython/
kedro ipython
,kedro jupyter lab/notebook
helpers don't work for managed Jupyter instanceskedro jupyter notebook convert
CLI commandThese challenges are not exhaustive, but they arguably present a significant barrier for Jupyter Notebook users interacting with Kedro and make up for an unpleasant experience.
Proposal
In order to improve the experience without major changes in Kedro, not long ago we have started the development of a Kedro IPython extension which was meant to replace the startup script in the
.ipython/
directory. The extension has a full feature parity already with the startup script for IPython sessions and after 7613dec it will be the primary way our IPython/Jupyter users will interact with Kedro.As next steps, I suggest that we aim for the following unified workflow based entirely on our IPython extension:
IPython
If the user can start the session themselves:
If the user is in an existing IPython session they cannot or do not want to restart:
Jupyter
For Jupyter, there will be only one way to load the extension and that will happen per notebook:
This should work for both local Jupyter setup and managed Jupyter instances.
IPython and Jupyter with preloaded Kedro extension
A new Kedro command should be created which is meant to be run once and enable Kedro's extension in the user's
~/.ipython/
folder. All Jupyter and IPython sessions started after this will have the Kedro IPython extension preloaded.The command will be a top-level command, without the need of an existing Kedro project. The name of the command is up for debate.
After this, Kedro projects will no longer need to have an
.ipython/
folder in them.Future
Once we have successfully migrated the community away from the old way of interacting with Kedro from IPython and Jupyter, we can continue the development of the plugin and add the following capabilities
Running an IPython session with preloaded datasets for a node
After running this in a Kedro project
we can preload the datasets which are inputs to this node, thus allowing the user to debug their pipeline at a particular node. This functionality is something already in use by internal teams, although they have their own scripts to facilitate it.
Jupyter extension to allow node editing
Jupyter provides an API for custom content loading. We can use this API and develop a Kedro Jupyter Notebook Server extension, which will allow us to edit nodes from Jupyter and browse them through their Kedro node name rather than their filename. This is what will enable us to integrate Jupyter notebooks in Kedro Lab.
This extension is contingent on the existence of the IPython session with preloaded datasets for a node, which will make up for a seamless experience.
The text was updated successfully, but these errors were encountered: