Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azureml.core.Run.log_*() logs are not working in child jobs #1922

Open
pezosanta opened this issue Jun 26, 2023 · 0 comments
Open

azureml.core.Run.log_*() logs are not working in child jobs #1922

pezosanta opened this issue Jun 26, 2023 · 0 comments

Comments

@pezosanta
Copy link

pezosanta commented Jun 26, 2023

Hi everyone,

I am trying to to build an AML pipeline for object detectionc/instance segmentation, where the last component would be used for training and model evaluation.

The pipeline is defined via the YAML format/schema (see below) and is run with az ml job create --file pipeline.yaml:

  • The pipeline itself is defined as:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
  • The pipeline components (Get Data, Train) are defined as:
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

I want to highlight/visualize a lot of metrics in the Metrics tab of the component like time-series metrics (loss, f1 etc.), X/Y graphs, confusion matrix etc. As the MLFlow API only support time-series-like metric logging (log a single metric value in each iteration/epoch etc.), for logging more advanced metrics, I try to use the azureml.core.Run.log* interface. The problem is that, these logs are only logged into the Output + logs as json files and not as metrics/graphs into the Metrics tab if they are logged at all. Here are the problematic metric logs:

  • azureml.core.Run.log_table(): This is not logged at all, nor into the Outputs + logs tab, nor into the Metrics tab.
  • azureml.core.Run.log_accuracy_table(): This is logged only into the Outputs + logs tab as a json file.
    {"schema_type": "accuracy_table", "schema_version": "1.0.1", "data": {"probability_tables": [[[82, 118, 0, 0], [75, 31, 87, 7], [66, 9, 109, 16], [46, 2, 116, 36], [0, 0, 118, 82]], [[60, 140, 0, 0], [56, 20, 120, 4], [47, 4, 136, 13], [28, 0, 140, 32], [0, 0, 140, 60]], [[58, 142, 0, 0], [53, 29, 113, 5], [40, 10, 132, 18], [24, 1, 141, 34], [0, 0, 142, 58]]], "percentile_tables": [[[82, 118, 0, 0], [82, 67, 51, 0], [75, 26, 92, 7], [48, 3, 115, 34], [3, 0, 118, 79]], [[60, 140, 0, 0], [60, 89, 51, 0], [60, 41, 99, 0], [46, 5, 135, 14], [3, 0, 140, 57]], [[58, 142, 0, 0], [56, 93, 49, 2], [54, 47, 95, 4], [41, 10, 132, 17], [3, 0, 142, 55]]], "probability_thresholds": [0.0, 0.25, 0.5, 0.75, 1.0], "percentile_thresholds": [0.0, 0.01, 0.24, 0.98, 1.0], "class_labels": ["class1", "class2", "class3"]}}
  • azureml.core.Run.log_confusion_matrix(): This is logged only into the Outputs + logs tab as a json file.
    {"schema_type": "confusion_matrix", "schema_version": "1.0.0", "data": {"class_labels": ["class1", "class2", "class3", "class4"], "matrix": [[4, 0, 1, 9], [0, 0, 0, 1], [6, 0, 5, 0], [0, 0, 0, 1]]}}

The codes used for these logs are as follows:

from azureml.core import Run
...

run = Run.get_context(allow_offline=False)
run.log_table("Y over X", {"x":[1, 2, 3], "y":[0.6, 0.7, 0.89]})
run.log_confusion_matrix(
        name="Confusion matrix",
        value = {
            "schema_type": "confusion_matrix",
            "schema_version": "1.0.0",
            "data": {
                "class_labels": ["class1", "class2", "class3", "class4"],
                "matrix": [
                    [4, 0, 1, 9],
                    [0, 0, 0, 1],
                    [6, 0, 5, 0],
                    [0, 0, 0, 1]
                ]
            }
        }
    )
run.log_accuracy_table(
        name="Accuracy Table",
        value= {
            "schema_type": "accuracy_table",
            "schema_version": "1.0.1",
            "data": {
                "probability_tables": [
                    [
                        [82, 118, 0, 0],
                        [75, 31, 87, 7],
                        [66, 9, 109, 16],
                        [46, 2, 116, 36],
                        [0, 0, 118, 82]
                    ],
                    [
                        [60, 140, 0, 0],
                        [56, 20, 120, 4],
                        [47, 4, 136, 13],
                        [28, 0, 140, 32],
                        [0, 0, 140, 60]
                    ],
                    [
                        [58, 142, 0, 0],
                        [53, 29, 113, 5],
                        [40, 10, 132, 18],
                        [24, 1, 141, 34],
                        [0, 0, 142, 58]
                    ]
                ],
                "percentile_tables": [
                    [
                        [82, 118, 0, 0],
                        [82, 67, 51, 0],
                        [75, 26, 92, 7],
                        [48, 3, 115, 34],
                        [3, 0, 118, 79]
                    ],
                    [
                        [60, 140, 0, 0],
                        [60, 89, 51, 0],
                        [60, 41, 99, 0],
                        [46, 5, 135, 14],
                        [3, 0, 140, 57]
                    ],
                    [
                        [58, 142, 0, 0],
                        [56, 93, 49, 2],
                        [54, 47, 95, 4],
                        [41, 10, 132, 17],
                        [3, 0, 142, 55]
                    ]
                ],
                "probability_thresholds": [0.0, 0.25, 0.5, 0.75, 1.0],
                "percentile_thresholds": [0.0, 0.01, 0.24, 0.98, 1.0],
                "class_labels": ["class1", "class2", "class3"]
            }
        },
        description="Some description."
    )

Here are some screenshots of the Azure ML dashboard.

  • The first pic shows that run.log_accuracy_table() and run.log_confusion_matrix() are logged as json file artifacts but run.log_table() is not:
    AML_outputs_tab
  • The second pic shows that neither of the run.log_*() metrics are visualized in the Metrics tab:
    AML_metrics_tab

IMPORTANT

If I run a simple python script as a job (so no pipeline definitions etc.) the run.log_accuracy_table(), run.log_confusion_matrix() and _run.log_table() metrics are logged properly.

aml-simple-job

Is this behaviour just a bug related to child jobs?

@pezosanta pezosanta changed the title azureml.core.Run.log logs are not visualized as metrics in pipeline components azureml.core.Run.log*() logs are not working in child jobs Jul 4, 2023
@pezosanta pezosanta changed the title azureml.core.Run.log*() logs are not working in child jobs azureml.core.Run.log_*() logs are not working in child jobs Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant