This repository contains example projects for the MLflow Recipes (previously known as MLflow Pipelines). To learn about specific recipe, follow the installation instructions below to install all necessary packages, then checkout the relevant example projects listed here.
Note: This example repo is intended for first-time MLflow Recipes users to learn its fundamental concepts and workflows. For users already familiar with MLflow Recipes, find a template repository to solve a specific ML problem. For example, for regression problem, use recipes-regression-template instead.
Note: MLflow Recipes is an experimental feature in MLflow. If you observe any issues, please report them here. For suggestions on improvements, please file a discussion topic here. Your contribution to MLflow Recipes is greatly appreciated by the community!
To use MLflow Recipes in this example repository,
simply install the packages listed in the requirements.txt
file. Note that Python 3.8
or above is required.
pip install -r requirements.txt
You may need to install additional libraries for extra features:
- Hyperopt is required for hyperparameter tuning.
- PySpark is required for distributed training or to ingest Spark tables.
- Delta is required to ingest Delta tables. These libraries are available natively in the Databricks Runtime for Machine Learning.
To log recipe runs to a particular MLflow experiment:
- Open
profiles/databricks.yaml
orprofiles/local.yaml
, depending on your environment. - Edit (and uncomment, if necessary) the
experiment
section, specifying the name of the desired experiment for logging.
Sync this repository with
Databricks Repos and run the notebooks/databricks
notebook on a Databricks Cluster running version 11.0 or greater of the
Databricks Runtime or the
Databricks Runtime for Machine Learning
with workspace files support enabled.
Note: When making changes to recipes on Databricks, it is recommended that you edit files on your local machine and use dbx to sync them to Databricks Repos, as demonstrated here
Note: data profiles display in step cards are not visually compatible with dark theme. Please avoid using the dark theme if possible.
You can find MLflow Experiments and MLflow Runs created by the recipe on the Databricks ML Experiments page.
- Launch the Jupyter Notebook environment via the
jupyter notebook
command. - Open and run the
notebooks/jupyter.ipynb
notebook in the Jupyter environment.
Note: data profiles display in step cards are not visually compatible with dark theme. Please avoid using the dark theme if possible.
First, enter the corresponding example root directory and set the profile via environment variable. For example, for the regression example project,
cd regression
export MLFLOW_RECIPES_PROFILE=local
Then, try running the
following MLflow Recipes CLI
commands to get started.
Note that the --step
argument is optional.
Recipe commands without a --step
specified act on the entire recipe instead.
Available step names are: ingest
, split
, transform
, train
, evaluate
and register
.
- Display the help message:
mlflow recipes --help
- Run a recipe step or the entire recipe:
mlflow recipes run --step step_name
- Inspect a step card or the recipe dependency graph:
mlflow recipes inspect --step step_name
- Clean a step cache or all step caches:
mlflow recipes clean --step step_name
To view MLflow Experiments and MLflow Runs created by the recipe:
-
Enter the example root directory, for example:
cd regression
-
Start the MLflow UI
mlflow ui \
--backend-store-uri sqlite:///metadata/mlflow/mlruns.db \
--default-artifact-root ./metadata/mlflow/mlartifacts \
--host localhost