Technical design decision record for `KedroSession` #1335

merelcht · 2022-03-09T14:07:01Z

The `KedroSession` ✨

The KedroSession is the object responsible for managing the lifecycle of a Kedro run. It has two main functions:

Run execution: It makes sure that all core components needed by Kedro to execute a run are instantiated and the run is executed properly
Persisting run data: KedroSession offers a way to persist run data through the session store. The following data gets saved in the session store:

package_name
project_path
session_id
CLI info: command run, run parameters
Git info: git sha, git branch, is branch dirty or not

Usage within Kedro 🏗

The KedroSession is a relatively new component within Kedro and at the time of writing, is mainly used to manage run lifecycles and for experiment tracking. The experiment tracking feature makes use of a session store implementation called the SQLiteStore, which uses SQLite to persist data. Other implementations of the session store available in Kedro are:

BaseSessionStore: the base class for all session stores that doesn’t persist any data
ShelveStore: implementation that uses the shelve package to persist data

Relation of a `run` and a `session` 🧑‍🤝‍🧑

While working on #1273 it was decided that Kedro session and Kedro run have a 1-1 mapping. This means that when a session gets created it will only ever be possible to kick off one full pipeline run during that specific session’s existence. In practice, Kedro manages this for you under the hood when kedro run is executed.

FAQ ❓

How does a Kedro user use KedroSession?
As a Kedro user you don’t need to access the session directly. When you execute the kedro run command, a new session gets created automatically. This session will then kick off the pipeline run and when that process finishes, the session will be closed again persisting any run data if the project is configured with a persistent session store.

What about using KedroSession in an interactive workflow?
When using jupyter or ipython you can access the active session object or create a new one. You can then retrieve the session_id, the run data that will be stored, load the context, and execute a run. However, we do not encourage users to use the session other than for checking the session_id and run data.

Related Github issues and PRs:

The text was updated successfully, but these errors were encountered:

datajoely · 2022-03-09T15:24:11Z

A question that I think myself and others will ask is - if I want to access the data catalog as a live object, do I need to create a session for that? Is that the right way?

merelcht · 2022-03-09T15:47:07Z

A question that I think myself and others will ask is - if I want to access the data catalog as a live object, do I need to create a session for that? Is that the right way?

The catalog is provided as variable just like the session, context and pipelines: https://kedro.readthedocs.io/en/stable/11_tools_integration/02_ipython.html#load-datacatalog-in-ipython

datajoely · 2022-03-09T16:00:00Z

@MerelTheisenQB I get that - but users will need to access it in other contexts such as plug-ins and (although not recommended) dynamic contexts. Is there scope to make the catalog importable like the pipelines object is?

antonymilne · 2022-03-09T17:44:12Z

@datajoely Personally I would like this (unless there's some strong arguments against it that I've forgotten), but I think it's outside the scope for now at least. When we talked about it before it didn't seem as easy to do as it is for pipelines unfortunately.

Just one comment on kedro session in the interactive workflow: eventually I wonder whether we should stop exposing session in ipython/jupyter at all, i.e. should we remove this line.

My immediate concern is that someone could end up saving to the session store sessions when they are not even doing session.run but just doing some data exploration (although it takes some effort to do so since you need to call session.close explicitly), and then the experiment tracking has empty runs in it. We could prevent this already by passing save_on_close=False here so that even calling session.close wouldn't save to the session store.

More generally though, I wonder whether there will be any good uses of session in the interactive workflow in the future. Once we're working on this scheme, it seems like a bit of an anti-pattern so maybe not something we should have available at all for users. I mentioned this to @idanov today and he seemed to be in favour of not exposing it. Interested to hear what others (@noklam?) think though, and whether it's important to be able to do session.run (or other session) stuff from a notebook.

noklam · 2022-03-09T19:06:50Z

@AntonyMilneQB For me, it's the ability to do checkpoint debugging in an interactive environment that matters. It may be I am not doing it in a right way, but I am interested in how others are using the Kedro Ipython/notebook other than EDA.

Just to recap, this is the workflow that I adopted in the past for development.

Run a partial pipeline and stop at the point of interest.
Do whatever I needed in a notebook environment. i.e. Changing the definition of a node / injecting / overwriting some of the data in catalog.
Continue to run the pipeline until I get my desired output.

lorenabalan · 2022-03-10T09:40:10Z

@AntonyMilneQB I think not being able to run anything in the jupyter notebook / ipython takes away a lot from jupyter users we're trying to convert to Python and Kedro. If we do that we need to seriously consider the consequences and clearly draw the boundaries of our target audience, because it sounds like they would be very different.

merelcht · 2023-12-13T15:30:29Z

Closing this as there's no immediate actions remaining for this issue.

merelcht changed the title ~~KedroSession technical design decision based on https://github.com/kedro-org/kedro/issues/1273~~ Technical design doc for KedroSession Mar 9, 2022

merelcht added the Type: Technical DR 💾 Decision Records (technical decisions made) label Mar 9, 2022

merelcht linked a pull request Mar 9, 2022 that will close this issue

Enforce 1 session = 1 run #1329

Merged

5 tasks

merelcht mentioned this issue Mar 9, 2022

[KED-2629] Users can't provide custom run_id or save_version to KedroSession #1273

Closed

3 tasks

antonymilne mentioned this issue Mar 10, 2022

Minor improvements in the IPython and Jupyter Notebook workflows #1075

Closed

antonymilne added the Component: Jupyter/IPython Issue/PR relevant for Jupyter Notebooks, IPython sessions and the interactive workflow in Kedro label Apr 7, 2022

antonymilne mentioned this issue Apr 7, 2022

Improve kedro run as a package #1423

Closed

10 tasks

antonymilne removed the Component: Jupyter/IPython Issue/PR relevant for Jupyter Notebooks, IPython sessions and the interactive workflow in Kedro label Jun 8, 2022

antonymilne added this to the Interactive workflow improvements milestone Jun 8, 2022

yetudada modified the milestones: Improve the Interactive Jupyter notebook workflow, Something about the session Jun 30, 2023

stichbury changed the title ~~Technical design doc for KedroSession~~ Technical design decision record for KedroSession Jul 5, 2023

merelcht closed this as completed Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical design decision record for `KedroSession` #1335

Technical design decision record for `KedroSession` #1335

merelcht commented Mar 9, 2022 •

edited

Loading

datajoely commented Mar 9, 2022

merelcht commented Mar 9, 2022

datajoely commented Mar 9, 2022

antonymilne commented Mar 9, 2022 •

edited

Loading

noklam commented Mar 9, 2022

lorenabalan commented Mar 10, 2022

merelcht commented Dec 13, 2023

Technical design decision record for KedroSession #1335

Technical design decision record for KedroSession #1335

Comments

merelcht commented Mar 9, 2022 • edited Loading

The KedroSession ✨

Usage within Kedro 🏗

Relation of a run and a session 🧑‍🤝‍🧑

FAQ ❓

Related Github issues and PRs:

datajoely commented Mar 9, 2022

merelcht commented Mar 9, 2022

datajoely commented Mar 9, 2022

antonymilne commented Mar 9, 2022 • edited Loading

noklam commented Mar 9, 2022

lorenabalan commented Mar 10, 2022

merelcht commented Dec 13, 2023

Technical design decision record for `KedroSession` #1335

Technical design decision record for `KedroSession` #1335

merelcht commented Mar 9, 2022 •

edited

Loading

The `KedroSession` ✨

Relation of a `run` and a `session` 🧑‍🤝‍🧑

antonymilne commented Mar 9, 2022 •

edited

Loading