[Evaluation][9] Add log_evaluations() API #12530

dbczumar · 2024-06-30T08:04:59Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/12530/merge

Checkout with GitHub CLI

gh pr checkout 12530

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add log_evaluations() API, which is used to log evaluations (inputs, outputs, targets, etc.) along with assessments, metrics, and tags to MLflow as artifacts.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-07-01T23:05:36Z

tests/evaluate/logging/utils.py

+from mlflow.tracking.client import MlflowClient
+
+
+def get_evaluation(*, run_id: str, evaluation_id: str) -> EvaluationEntity:


This is just a test utility to verify that evaluations are logged properly. In the future, when we support read APIs in the MLflow client, we will move the test utility into the MLflow fluent / client APIs.

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-07-01T23:17:49Z

mlflow/evaluation/fluent.py

+from mlflow.tracking.fluent import _get_or_start_run
+
+
+def log_evaluations(


This is a fluent API, hence we call the file mlflow/evaluation/fluent.py. We just aren't importing this within mlflow/__init__.py yet or documenting it

dbczumar · 2024-07-01T23:19:01Z

mlflow/evaluation/__init__.py

@@ -1,4 +1,12 @@
+from mlflow.entities.assessment_source import AssessmentSource, AssessmentSourceType


The Databricks Mosaic AI Agent framework depends on these entities being part of the mlflow.evaluation module. Hence, we import them here. I also added a test case for this: test_evaluation_module_exposes_relevant_apis_for_logging

harupy · 2024-07-01T23:50:11Z

tests/evaluate/logging/utils.py

+def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
+    return (
+        any(file.path == "_evaluations.json" for file in client.list_artifacts(run_id))
+        and any(file.path == "_metrics.json" for file in client.list_artifacts(run_id))
+        and any(file.path == "_assessments.json" for file in client.list_artifacts(run_id))
+    )


/ai rewrite this using set

@harupy

Here's how you can rewrite the code using set:

def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool: artifact_paths_set = {file.path for file in client.list_artifacts(run_id)} required_files = {"_evaluations.json", "_metrics.json", "_assessments.json"} return required_files.issubset(artifact_paths_set)

The function set.issubset() checks whether every element in the required_files set is in the artifact_paths_set. This is a more efficient way as it only needs to scan through the list of artifacts once, not three times as in the original version.

Click here to see the usage data

{ "prompt_tokens": 133, "completion_tokens": 144, "total_tokens": 277, "estimated_cost_in_usd": 0.01263 }

/ai cool, make it one-liner

@harupy

Sure, here is the one-liner modification of the function:

def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool: return {"_evaluations.json", "_metrics.json", "_assessments.json"}.issubset({file.path for file in client.list_artifacts(run_id)})

Click here to see the usage data

{ "prompt_tokens": 294, "completion_tokens": 75, "total_tokens": 369, "estimated_cost_in_usd": 0.01332 }

Thanks for the suggestion! I've applied it :)

harupy · 2024-07-01T23:51:19Z

tests/evaluate/logging/utils.py

+        List[EvaluationEntity]: A list of Evaluation entities created from the DataFrames.
+    """
+
+    def group_by_evaluation_id(df: pd.DataFrame):


can this function be moved in the top level?

Why? it isn't used anywhere else and is very specific to the implementation of the method that contains it

It's just my habit to avoid defining function every time this function is called, avoid unintended variable shadowing, and improve code readability. Not a blocker at all.

Got it. I've moved it to the top level and fixed docstrings :)

BenWilson2 · 2024-07-01T23:53:19Z

mlflow/evaluation/fluent.py

+    evaluations_df, metrics_df, assessments_df, tags_df = evaluations_to_dataframes(
+        evaluation_entities
+    )
+    client.log_table(run_id=run_id, data=evaluations_df, artifact_file="_evaluations.json")


Q: What is the behavior when the DF is empty?

The table is still logged, but we probably shouldn't log empty tables if no evaluations are specified. Added a fix and a test case for this: test_log_evaluations_works_properly_with_empty_evaluations_list

harupy · 2024-07-01T23:57:46Z

tests/evaluate/logging/test_fluent.py

+    # End the run to clean up
+    mlflow.end_run()


A teardown operation like this should be performed in a fixture. We don't reach here if we hit an error.

Added a fixture!

tests/evaluate/logging/test_fluent.py

Signed-off-by: dbczumar <[email protected]>

dbczumar

Thanks @harupy, @BenWilson2 ! I've addressed your comments. Let me know if there are other blockers :)

dbczumar · 2024-07-02T00:01:03Z

mlflow/evaluation/fluent.py

+    evaluations_df, metrics_df, assessments_df, tags_df = evaluations_to_dataframes(
+        evaluation_entities
+    )
+    client.log_table(run_id=run_id, data=evaluations_df, artifact_file="_evaluations.json")


The table is still logged, but we probably shouldn't log empty tables if no evaluations are specified. Added a fix and a test case for this: test_log_evaluations_works_properly_with_empty_evaluations_list

tests/evaluate/logging/test_fluent.py

dbczumar · 2024-07-02T00:03:33Z

tests/evaluate/logging/test_fluent.py

+    # End the run to clean up
+    mlflow.end_run()


Added a fixture!

dbczumar · 2024-07-02T00:04:30Z

tests/evaluate/logging/utils.py

+def _contains_evaluation_artifacts(*, client: MlflowClient, run_id: str) -> bool:
+    return (
+        any(file.path == "_evaluations.json" for file in client.list_artifacts(run_id))
+        and any(file.path == "_metrics.json" for file in client.list_artifacts(run_id))
+        and any(file.path == "_assessments.json" for file in client.list_artifacts(run_id))
+    )


Thanks for the suggestion! I've applied it :)

harupy · 2024-07-02T00:07:51Z

tests/evaluate/logging/test_fluent.py

+    try:
+        yield
+    finally:
+        mlflow.end_run()


try-finally can be removed. pytest handles error in tests. Fine to keep it. It's not harmless.

Had no idea! Removed try/finally. Thanks for teaching me this :)

Signed-off-by: dbczumar <[email protected]>

harupy

LGTM!

harupy · 2024-07-02T00:15:30Z

tests/evaluate/logging/test_fluent.py

+def test_evaluation_module_exposes_relevant_apis_for_logging():
+    pass
+


can we remove this test?

Not quite, we need the test. Looks like the linter doesn't think the imports are necessary and tries to remove them. I'm going to go back to the Assert approach.

Thanks for catching this :)

Signed-off-by: dbczumar <[email protected]>

BenWilson2

Looks great!

Signed-off-by: dbczumar <[email protected]>

dbczumar added 30 commits June 29, 2024 16:50

Test

5d85f44

Signed-off-by: dbczumar <[email protected]>

Test

194f33f

Signed-off-by: dbczumar <[email protected]>

fix

8536d21

Signed-off-by: dbczumar <[email protected]>

fix

80575c4

Signed-off-by: dbczumar <[email protected]>

test cases

85c51ff

Signed-off-by: dbczumar <[email protected]>

smaller

8f7aba7

Signed-off-by: dbczumar <[email protected]>

test case

203c846

Signed-off-by: dbczumar <[email protected]>

improved coverage

11c08f4

Signed-off-by: dbczumar <[email protected]>

fix

6a11ac8

Signed-off-by: dbczumar <[email protected]>

exclude

fd4f74a

Signed-off-by: dbczumar <[email protected]>

fix

5df124e

Signed-off-by: dbczumar <[email protected]>

test

3b2d76f

Signed-off-by: dbczumar <[email protected]>

Add tag

423a274

Signed-off-by: dbczumar <[email protected]>

Entity and tests

2e774df

Signed-off-by: dbczumar <[email protected]>

fix

f7ec193

Signed-off-by: dbczumar <[email protected]>

fix

5a0e3c8

Signed-off-by: dbczumar <[email protected]>

Add eval

ca5a244

Signed-off-by: dbczumar <[email protected]>

fix

6a843ca

Signed-off-by: dbczumar <[email protected]>

metric

59cda06

Signed-off-by: dbczumar <[email protected]>

fix

1039ca4

Signed-off-by: dbczumar <[email protected]>

Test

879de4a

Signed-off-by: dbczumar <[email protected]>

Test

07e1005

Signed-off-by: dbczumar <[email protected]>

Add

5b4e157

Signed-off-by: dbczumar <[email protected]>

init

d4b56dd

Signed-off-by: dbczumar <[email protected]>

fix

063f46a

Signed-off-by: dbczumar <[email protected]>

Test

cd7535d

Signed-off-by: dbczumar <[email protected]>

fix

5fb3817

Signed-off-by: dbczumar <[email protected]>

fix

035ce54

Signed-off-by: dbczumar <[email protected]>

merge

3ca6606

Signed-off-by: dbczumar <[email protected]>

Cases

33139e8

Signed-off-by: dbczumar <[email protected]>

github-actions bot added the rn/none List under Small Changes in Changelogs. label Jun 30, 2024

dbczumar added 3 commits July 1, 2024 13:51

fix

832dbda

Signed-off-by: dbczumar <[email protected]>

fix

1086c6d

Signed-off-by: dbczumar <[email protected]>

fix

b179c24

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Jul 1, 2024

View reviewed changes

dbczumar added 3 commits July 1, 2024 16:07

fix

8c7cfa0

Signed-off-by: dbczumar <[email protected]>

test

a60d5a9

Signed-off-by: dbczumar <[email protected]>

fix

2ae4bc9

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Jul 1, 2024

View reviewed changes

dbczumar requested review from BenWilson2 and harupy July 1, 2024 23:19

harupy reviewed Jul 1, 2024

View reviewed changes

BenWilson2 reviewed Jul 1, 2024

View reviewed changes

harupy reviewed Jul 1, 2024

View reviewed changes

tests/evaluate/logging/test_fluent.py Show resolved Hide resolved

dbczumar added 3 commits July 1, 2024 17:01

comments

a30b73d

Signed-off-by: dbczumar <[email protected]>

fix

624224a

Signed-off-by: dbczumar <[email protected]>

fix

7843c37

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Jul 2, 2024

View reviewed changes

harupy reviewed Jul 2, 2024

View reviewed changes

dbczumar added 2 commits July 1, 2024 17:09

fix

a46b470

Signed-off-by: dbczumar <[email protected]>

fix

93bbb86

Signed-off-by: dbczumar <[email protected]>

harupy approved these changes Jul 2, 2024

View reviewed changes

harupy reviewed Jul 2, 2024

View reviewed changes

Assert

a8378ec

Signed-off-by: dbczumar <[email protected]>

BenWilson2 approved these changes Jul 2, 2024

View reviewed changes

dbczumar merged commit a9e4a19 into mlflow:master Jul 2, 2024
38 of 40 checks passed

BenWilson2 pushed a commit that referenced this pull request Jul 3, 2024

[Evaluation][9] Add log_evaluations() API (#12530)

a481a23

Signed-off-by: dbczumar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evaluation][9] Add log_evaluations() API #12530

[Evaluation][9] Add log_evaluations() API #12530

dbczumar commented Jun 30, 2024 •

edited

Loading

dbczumar Jul 1, 2024

dbczumar Jul 1, 2024

dbczumar Jul 1, 2024

harupy Jul 1, 2024

mlflow bot Jul 1, 2024

harupy Jul 1, 2024

mlflow bot Jul 1, 2024

dbczumar Jul 2, 2024

harupy Jul 1, 2024

dbczumar Jul 1, 2024 •

edited

Loading

harupy Jul 2, 2024 •

edited

Loading

dbczumar Jul 2, 2024

BenWilson2 Jul 1, 2024

dbczumar Jul 2, 2024

harupy Jul 1, 2024

dbczumar Jul 2, 2024

dbczumar left a comment

dbczumar Jul 2, 2024

dbczumar Jul 2, 2024

dbczumar Jul 2, 2024

harupy Jul 2, 2024

dbczumar Jul 2, 2024

harupy left a comment

harupy Jul 2, 2024

dbczumar Jul 2, 2024 •

edited

Loading

dbczumar Jul 2, 2024

BenWilson2 left a comment

		from mlflow.tracking.client import MlflowClient


		def get_evaluation(*, run_id: str, evaluation_id: str) -> EvaluationEntity:

		from mlflow.tracking.fluent import _get_or_start_run


		def log_evaluations(

		@@ -1,4 +1,12 @@
		from mlflow.entities.assessment_source import AssessmentSource, AssessmentSourceType

		def test_evaluation_module_exposes_relevant_apis_for_logging():
		pass

[Evaluation][9] Add log_evaluations() API #12530

[Evaluation][9] Add log_evaluations() API #12530

Conversation

dbczumar commented Jun 30, 2024 • edited Loading

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlflow bot Jul 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlflow bot Jul 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar Jul 1, 2024 • edited Loading

Choose a reason for hiding this comment

harupy Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenWilson2 left a comment

Choose a reason for hiding this comment

dbczumar commented Jun 30, 2024 •

edited

Loading

dbczumar Jul 1, 2024 •

edited

Loading

harupy Jul 2, 2024 •

edited

Loading

dbczumar Jul 2, 2024 •

edited

Loading