Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Evaluation][9] Add log_evaluations() API #12530

Merged
merged 71 commits into from
Jul 2, 2024
Merged

Conversation

dbczumar
Copy link
Collaborator

@dbczumar dbczumar commented Jun 30, 2024

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/12530/merge

Checkout with GitHub CLI

gh pr checkout 12530

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add log_evaluations() API, which is used to log evaluations (inputs, outputs, targets, etc.) along with assessments, metrics, and tags to MLflow as artifacts.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
@github-actions github-actions bot added the rn/none List under Small Changes in Changelogs. label Jun 30, 2024
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
from mlflow.tracking.client import MlflowClient


def get_evaluation(*, run_id: str, evaluation_id: str) -> EvaluationEntity:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a test utility to verify that evaluations are logged properly. In the future, when we support read APIs in the MLflow client, we will move the test utility into the MLflow fluent / client APIs.

Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
from mlflow.tracking.fluent import _get_or_start_run


def log_evaluations(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fluent API, hence we call the file mlflow/evaluation/fluent.py. We just aren't importing this within mlflow/__init__.py yet or documenting it

@@ -1,4 +1,12 @@
from mlflow.entities.assessment_source import AssessmentSource, AssessmentSourceType
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Databricks Mosaic AI Agent framework depends on these entities being part of the mlflow.evaluation module. Hence, we import them here. I also added a test case for this: test_evaluation_module_exposes_relevant_apis_for_logging

Comment on lines 58 to 63
def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
return (
any(file.path == "_evaluations.json" for file in client.list_artifacts(run_id))
and any(file.path == "_metrics.json" for file in client.list_artifacts(run_id))
and any(file.path == "_assessments.json" for file in client.list_artifacts(run_id))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ai rewrite this using set

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy

Here's how you can rewrite the code using set:

def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
    artifact_paths_set = {file.path for file in client.list_artifacts(run_id)}
    required_files = {"_evaluations.json", "_metrics.json", "_assessments.json"}
    return required_files.issubset(artifact_paths_set)

The function set.issubset() checks whether every element in the required_files set is in the artifact_paths_set. This is a more efficient way as it only needs to scan through the list of artifacts once, not three times as in the original version.

Click here to see the usage data
{
  "prompt_tokens": 133,
  "completion_tokens": 144,
  "total_tokens": 277,
  "estimated_cost_in_usd": 0.01263
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ai cool, make it one-liner

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy

Sure, here is the one-liner modification of the function:

def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
    return {"_evaluations.json", "_metrics.json", "_assessments.json"}.issubset({file.path for file in client.list_artifacts(run_id)})
Click here to see the usage data
{
  "prompt_tokens": 294,
  "completion_tokens": 75,
  "total_tokens": 369,
  "estimated_cost_in_usd": 0.01332
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I've applied it :)

List[EvaluationEntity]: A list of Evaluation entities created from the DataFrames.
"""

def group_by_evaluation_id(df: pd.DataFrame):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this function be moved in the top level?

Copy link
Collaborator Author

@dbczumar dbczumar Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? it isn't used anywhere else and is very specific to the implementation of the method that contains it

Copy link
Member

@harupy harupy Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just my habit to avoid defining function every time this function is called, avoid unintended variable shadowing, and improve code readability. Not a blocker at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I've moved it to the top level and fixed docstrings :)

evaluations_df, metrics_df, assessments_df, tags_df = evaluations_to_dataframes(
evaluation_entities
)
client.log_table(run_id=run_id, data=evaluations_df, artifact_file="_evaluations.json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What is the behavior when the DF is empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is still logged, but we probably shouldn't log empty tables if no evaluations are specified. Added a fix and a test case for this: test_log_evaluations_works_properly_with_empty_evaluations_list

Comment on lines 196 to 197
# End the run to clean up
mlflow.end_run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A teardown operation like this should be performed in a fixture. We don't reach here if we hit an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a fixture!

Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Copy link
Collaborator Author

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @harupy, @BenWilson2 ! I've addressed your comments. Let me know if there are other blockers :)

evaluations_df, metrics_df, assessments_df, tags_df = evaluations_to_dataframes(
evaluation_entities
)
client.log_table(run_id=run_id, data=evaluations_df, artifact_file="_evaluations.json")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is still logged, but we probably shouldn't log empty tables if no evaluations are specified. Added a fix and a test case for this: test_log_evaluations_works_properly_with_empty_evaluations_list

tests/evaluate/logging/test_fluent.py Show resolved Hide resolved
Comment on lines 196 to 197
# End the run to clean up
mlflow.end_run()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a fixture!

Comment on lines 58 to 63
def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
return (
any(file.path == "_evaluations.json" for file in client.list_artifacts(run_id))
and any(file.path == "_metrics.json" for file in client.list_artifacts(run_id))
and any(file.path == "_assessments.json" for file in client.list_artifacts(run_id))
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I've applied it :)

Comment on lines 14 to 17
try:
yield
finally:
mlflow.end_run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try-finally can be removed. pytest handles error in tests. Fine to keep it. It's not harmless.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had no idea! Removed try/finally. Thanks for teaching me this :)

Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 205 to 207
def test_evaluation_module_exposes_relevant_apis_for_logging():
pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this test?

Copy link
Collaborator Author

@dbczumar dbczumar Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, we need the test. Looks like the linter doesn't think the imports are necessary and tries to remove them. I'm going to go back to the Assert approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this :)

Signed-off-by: dbczumar <[email protected]>
Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@dbczumar dbczumar merged commit a9e4a19 into mlflow:master Jul 2, 2024
38 of 40 checks passed
BenWilson2 pushed a commit that referenced this pull request Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants