Enable Kedro-Viz functionality through a notebook, without Kedro Framework. #1459

NeroOkwa · 2023-07-21T11:26:25Z

Description

Make it possible to use Kedro-Viz (pipeline visualisation and experiment tracking) without Kedro framework by using a notebook.

For example I will be able to build a pipeline in notebook and have nodes that output metrics; I will be able to %run_viz and Kedro-Viz will open up with a view of my pipeline and experiments.

Context

Currently, Kedro-Viz is tightly coupled with Kedro framework making it impossible for non-kedro users to use Kedro-Viz. This was highlighted as a pain point in the experiment tracking user research:

"In this case if I really like experiment tracking I might not consider using it if it isn't a kedro project... I am not sure it is a good direction to go with it being completely integrated, especially if there is a new thing like Mlflow"

Secondly, from the non-technical user research #1280 we discovered a group of 'low-code' users that only use notebooks ( e.g. Data Analyst, J. Data Scientist, Researchers). This is a sizeable group (estimated at 70%) within data teams. Providing a notebook access to Kedro-Viz would make it easier for these users to use Kedro-Viz.

What's happening?

If I wanted to use Kedro-Viz in a notebook, without Kedro Framework then this would not be possible. So if I had a setup like this:

my-project
├── my-notebook.ipynb
├── Customer-Churn-Records.csv
├── parameters.yml
├── catalog.yml
└── requirements.txt

Then I’d never be able to see a pipeline visualisation even if, I had:
requirements.txt

kedro==0.18.11
kedro-viz==6.3.3
kedro-datasets[pandas.CSVDataSet]~=1.1

my-notebook.ipynb

from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog
from kedro.pipeline import Pipeline, node
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from typing import Dict
import logging
import pandas as pd


### Insert something new to load catalog.yml and parameters.yml


def preprocess_data(data: pd.DataFrame) -> pd.DataFrame:
    data = data.drop(columns=['RowNumber', 'CustomerId', 'Surname'])
    le = LabelEncoder()
    data['Gender'] = le.fit_transform(data['Gender'])
    data = pd.get_dummies(data, columns=['Geography', 'Card Type'])
    return data


def split_data(data: pd.DataFrame, test_size: float, random_state: int) -> Dict:
    X = data.drop(columns='Exited')
    y = data['Exited']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
    return dict(train=(X_train, y_train), test=(X_test, y_test))


def train_model(train: Dict, random_state: int) -> RandomForestClassifier:
    X_train, y_train = train['train']
    rf_clf = RandomForestClassifier(random_state=random_state)
    rf_clf.fit(X_train, y_train)
    return rf_clf


def evaluate_model(model: RandomForestClassifier, test: Dict) -> None:
    X_test, y_test = test['test']
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    confusion_mat = confusion_matrix(y_test, y_pred)
    class_report = classification_report(y_test, y_pred)
    log = logging.getLogger(__name__)
    log.info("Model Accuracy: %s", accuracy)
    log.info("Confusion Matrix: \n%s", confusion_mat)
    log.info("Classification Report: \n%s", class_report)


my_pipeline = pipeline([
        node(preprocess_data, "customers", "preprocessed_customers"),
				node(split_data, ["preprocessed_customers", "params:test_size", "params:random_state"], "split_data"),
        node(train_model, ["split_data", "params:random_state"], "rf_model"),
        node(evaluate_model, ["rf_model", "split_data"], None),
    ])

%run_viz my_pipeline

It should be possible to see the following in another cell in my Jupyter notebook, with the option to open it up in another tab:

Outcome

A user will be able to use Kedro-Viz from a notebook, without the need/setup of a Kedro framework.

Evidence markers

The comments around the learning curve may suggest that users are not used to working in many files and in different directories
We will be seeking evidence markers from this in Understanding users that use notebooks. #1448

The text was updated successfully, but these errors were encountered:

datajoely · 2023-07-21T15:15:47Z

I love this!

yetudada · 2023-07-28T11:21:27Z

I love this!

What do you love about this? 😄

datajoely · 2023-07-28T11:31:51Z

I think I have two thoughts -

This is a neat way of making Kedro Viz useful to people who don't want the complexity of the IDE and may be a stepping stone to getting people into that space.
The second point is something I know others have mentioned before - it annoys me that we need to actually load a valid Kedro project with all of it's imports and dependencies just to visualise the pipeline flow. Kedro Viz (in my mind) should load instantly, you shouldn't have to wait for Spark to spin up (especially because you can't run the pipeline anyway). I've long thought Kedro should be able to create a session lazily so you can read the pipeline structure for Viz cheaply without incurring the other costs.

astrojuanlu · 2023-08-02T12:57:26Z

Idea: a kedro-openlineage plugin that emits static OpenLineage metadata events, either in ndjson format or to an HTTP endpoint, which are then consumed by Kedro Viz. This is possible with openlineage-python 1.0, released yesterday.

datajoely · 2023-08-02T13:05:26Z

100000% also lots of LFAI projects there we should deffo do this

datajoely · 2023-08-15T14:25:01Z

This thread on Slack shows a user wanting to merge Viz from 3 different Kedro projects that can't exist side by side since they have conflicting dependencies. Kedro Viz doesn't need to run this, it just needs to visualise the pipeline structure:
https://linen-slack.kedro.org/t/14142730/hi-everyone-is-it-possible-to-combine-multiple-kedro-project#d84d8f45-eecc-4c1b-b639-4556c1edcd76

noklam · 2024-03-25T15:21:58Z

I realised I didn't leave a comment here. I created this last year https://github.com/noklam/kedro-viz-lite. I actually don't remember if I succeed at the end, the logic are mostly in https://github.com/noklam/kedro-viz-lite/blob/main/kedro_viz_lite/core.py.

This lead to my subsequent proposal for the kedro viz build and kedro viz GH page.

My use case for this is explore Pipeline structure, particular when I need to confirm my pipeline works as expected with namespace. The alternative of this is creating a full-blown Kedro project which is a lot of boilerplate. What I care is just the DAGs, and it should be enough as long as I have the DataCatalog and Pipeline. It's also because kedro viz is kind of slow to start up, thus making it hard when I just want to debug quickly. (--reload sometimes just break completely if I have an incomplete Kedro Project)

If this add a bit context, I was writing https://noklam.github.io/blog/posts/understand_namespace/2023-09-26-understand-kedro-namespace-pipeline.html when I think about this.

NeroOkwa added the Issue: Feature Request label Jul 21, 2023

NeroOkwa self-assigned this Jul 21, 2023

tynandebold added Type: Needs Investigation labels Jul 31, 2023

yetudada mentioned this issue Aug 7, 2023

Insights and opportunities related to helping Kedro impact more users kedro-org/kedro#2901

Closed

astrojuanlu mentioned this issue Aug 16, 2023

Kedro-viz needs too much to run properly #1159

Closed

rashidakanchwala added Idea and removed Type: Technical Design labels Dec 5, 2023

datajoely mentioned this issue Feb 8, 2024

Kedro-viz --lite : Build DAG without importing the code #1742

Closed

1 task

kedro-org locked and limited conversation to collaborators Mar 27, 2024

rashidakanchwala converted this issue into discussion #1833 Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Enable Kedro-Viz functionality through a notebook, without Kedro Framework. #1459

Enable Kedro-Viz functionality through a notebook, without Kedro Framework. #1459

NeroOkwa commented Jul 21, 2023 •

edited by stichbury

Loading

datajoely commented Jul 21, 2023

yetudada commented Jul 28, 2023

datajoely commented Jul 28, 2023

astrojuanlu commented Aug 2, 2023

datajoely commented Aug 2, 2023

datajoely commented Aug 15, 2023

noklam commented Mar 25, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Enable Kedro-Viz functionality through a notebook, without Kedro Framework. #1459

Enable Kedro-Viz functionality through a notebook, without Kedro Framework. #1459

Comments

NeroOkwa commented Jul 21, 2023 • edited by stichbury Loading

Description

Context

What's happening?

Outcome

Evidence markers

datajoely commented Jul 21, 2023

yetudada commented Jul 28, 2023

datajoely commented Jul 28, 2023

astrojuanlu commented Aug 2, 2023

datajoely commented Aug 2, 2023

datajoely commented Aug 15, 2023

noklam commented Mar 25, 2024 • edited Loading

This issue was moved to a discussion.

NeroOkwa commented Jul 21, 2023 •

edited by stichbury

Loading

noklam commented Mar 25, 2024 •

edited

Loading