-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Evaluation][9] Add log_evaluations() API #12530
Conversation
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
from mlflow.tracking.client import MlflowClient | ||
|
||
|
||
def get_evaluation(*, run_id: str, evaluation_id: str) -> EvaluationEntity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a test utility to verify that evaluations are logged properly. In the future, when we support read APIs in the MLflow client, we will move the test utility into the MLflow fluent / client APIs.
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
from mlflow.tracking.fluent import _get_or_start_run | ||
|
||
|
||
def log_evaluations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fluent API, hence we call the file mlflow/evaluation/fluent.py
. We just aren't importing this within mlflow/__init__.py
yet or documenting it
@@ -1,4 +1,12 @@ | |||
from mlflow.entities.assessment_source import AssessmentSource, AssessmentSourceType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Databricks Mosaic AI Agent framework depends on these entities being part of the mlflow.evaluation
module. Hence, we import them here. I also added a test case for this: test_evaluation_module_exposes_relevant_apis_for_logging
tests/evaluate/logging/utils.py
Outdated
def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool: | ||
return ( | ||
any(file.path == "_evaluations.json" for file in client.list_artifacts(run_id)) | ||
and any(file.path == "_metrics.json" for file in client.list_artifacts(run_id)) | ||
and any(file.path == "_assessments.json" for file in client.list_artifacts(run_id)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ai rewrite this using set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how you can rewrite the code using set
:
def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
artifact_paths_set = {file.path for file in client.list_artifacts(run_id)}
required_files = {"_evaluations.json", "_metrics.json", "_assessments.json"}
return required_files.issubset(artifact_paths_set)
The function set.issubset() checks whether every element in the required_files set is in the artifact_paths_set. This is a more efficient way as it only needs to scan through the list of artifacts once, not three times as in the original version.
Click here to see the usage data
{
"prompt_tokens": 133,
"completion_tokens": 144,
"total_tokens": 277,
"estimated_cost_in_usd": 0.01263
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ai cool, make it one-liner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, here is the one-liner modification of the function:
def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
return {"_evaluations.json", "_metrics.json", "_assessments.json"}.issubset({file.path for file in client.list_artifacts(run_id)})
Click here to see the usage data
{
"prompt_tokens": 294,
"completion_tokens": 75,
"total_tokens": 369,
"estimated_cost_in_usd": 0.01332
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I've applied it :)
tests/evaluate/logging/utils.py
Outdated
List[EvaluationEntity]: A list of Evaluation entities created from the DataFrames. | ||
""" | ||
|
||
def group_by_evaluation_id(df: pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this function be moved in the top level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? it isn't used anywhere else and is very specific to the implementation of the method that contains it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just my habit to avoid defining function every time this function is called, avoid unintended variable shadowing, and improve code readability. Not a blocker at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I've moved it to the top level and fixed docstrings :)
evaluations_df, metrics_df, assessments_df, tags_df = evaluations_to_dataframes( | ||
evaluation_entities | ||
) | ||
client.log_table(run_id=run_id, data=evaluations_df, artifact_file="_evaluations.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: What is the behavior when the DF is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table is still logged, but we probably shouldn't log empty tables if no evaluations are specified. Added a fix and a test case for this: test_log_evaluations_works_properly_with_empty_evaluations_list
# End the run to clean up | ||
mlflow.end_run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A teardown operation like this should be performed in a fixture. We don't reach here if we hit an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a fixture!
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @harupy, @BenWilson2 ! I've addressed your comments. Let me know if there are other blockers :)
evaluations_df, metrics_df, assessments_df, tags_df = evaluations_to_dataframes( | ||
evaluation_entities | ||
) | ||
client.log_table(run_id=run_id, data=evaluations_df, artifact_file="_evaluations.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table is still logged, but we probably shouldn't log empty tables if no evaluations are specified. Added a fix and a test case for this: test_log_evaluations_works_properly_with_empty_evaluations_list
# End the run to clean up | ||
mlflow.end_run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a fixture!
tests/evaluate/logging/utils.py
Outdated
def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool: | ||
return ( | ||
any(file.path == "_evaluations.json" for file in client.list_artifacts(run_id)) | ||
and any(file.path == "_metrics.json" for file in client.list_artifacts(run_id)) | ||
and any(file.path == "_assessments.json" for file in client.list_artifacts(run_id)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I've applied it :)
try: | ||
yield | ||
finally: | ||
mlflow.end_run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try-finally
can be removed. pytest handles error in tests. Fine to keep it. It's not harmless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had no idea! Removed try/finally. Thanks for teaching me this :)
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
def test_evaluation_module_exposes_relevant_apis_for_logging(): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, we need the test. Looks like the linter doesn't think the imports are necessary and tries to remove them. I'm going to go back to the Assert approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this :)
Signed-off-by: dbczumar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Signed-off-by: dbczumar <[email protected]>
🛠 DevTools 🛠
Install mlflow from this PR
Checkout with GitHub CLI
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Add log_evaluations() API, which is used to log evaluations (inputs, outputs, targets, etc.) along with assessments, metrics, and tags to MLflow as artifacts.
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yes
should be selected for bug fixes, documentation updates, and other small changes.No
should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.