Metrics of experiments with different tech implementation #34

elleobrien · 2020-03-26T22:38:25Z

This is a discussion point, not really an issue. I'm thinking about how metrics are displayed:

I definitely want to know that I'm comparing two experiments in which hyperparameters of my model (here, the maximum depth of a random forest classifier max_depth) changed. But, whereas it makes sense to have a "diff" presented for the accuracy metric, I'm not so sure it matters to have a diff present for the hyperparameters. It's not a number we're trying to optimize (unlike accuracy diffs) and visually, it makes the display more cluttered.

I might suggest having a separate table for comparing hyperparameters that doesn't present diffs, just a side-by-side comparison. And then a table for comparing the output metrics, where I do care about the diff. Would this be challenging to implement? Maybe, for each distinct metric file, its own table? And then somewhere in project preferences a user could specify if we want diffs.

Another way of thinking about this is that if I had a spreadsheet of experiments I was trying to compare, I would lay it out this way:

experiment id	parameterA	parameterB	parameterC	accuracy
1bac226	24	5	140	0.899
f90k153	24	2	140	0.9111

And then perhaps highlight the row containing the best experiment (assuming that we can specify somehwere if we want + or - for the metric). If you want the diff explicitly calculated, maybe put it in its own field below the table.

The text was updated successfully, but these errors were encountered:

dmpetrov · 2020-03-26T23:33:55Z

@andronovhopf really great feedback!

Most of it is under development already in the core-dvc:

the original issue Introduce hyper parameters and config dvc#3393
the implementation introduce hyper parameters and config dvc#3515
diffs with the request to avoid deltas - exactly as you asked :) - params diff dvc#3528 (in the process)

The spreadsheet of experiments is another great idea. We should think about that.

Re the spreadsheet... what would be your criteria to include an experiment into the table? How many of these would you expect to see here?

shcheklein · 2020-03-26T23:38:42Z

diffs with the request to avoid deltas - exactly as you asked :) - iterative/dvc#3528 (in the process)

@andronovhopf how and where do you specify the max_depth parameter? Is train.json is actually a file with hyperparams in your case? Could you share both json files please? :)

elleobrien · 2020-03-27T00:45:29Z

@shcheklein yes train.json is a file containing hyperparameters, and that's where max_depth is specified. I just invited you and @dmpetrov to the repository; the metric files are here.

@dmpetrov, re: spreadsheet. Two ways of selecting experiments to display in a table come to mind:

If I'm doing a PR, compare PR to master. So only two experiments.
A view of all commits on a branch compared. So as many experiments as there are commits (assuming CI was done after each commit)

Any other ideas?

dmpetrov · 2020-03-27T01:13:37Z

@andronovhopf did you run it like dvc run -M metrics/train.json -M metrics/eval.json ... and write all the params and metrics separately?

I like both the ways. If we do that:

the current one and the baseline are must-have.
it is convenient to see all from the current branch up to the master. However, some limits required due to the CI-reports limitation. Something like 10 or 30.

dmpetrov · 2020-03-27T01:26:09Z

@andronovhopf did you run it like dvc run -M metrics/train.json -M metrics/eval.json ... and write all the params and metrics separately?

Oh, I see that in the repo https://github.com/andronovhopf/cml_scratch

elleobrien · 2020-03-27T01:27:49Z

@dmpetrov the pipeline has two stages (train.dvc and eval.dvc) and each stage writes a metric file. And yep!

elleobrien · 2020-03-27T01:32:19Z

Another observation: my project has two branches; on master I am running a random forest classifier and on DNN a deep neural network. When I look at the report for the last commit on DNN, it looks like this:

Now, because the hyperparameters I'm collecting are not the same as on master (epochs & neurons vs. max_depth), comparing metrics from train.json doesn't make a lot of sense.

Also, I know we are planning to do this eventually- but here's a case where being able to compare two commits on the same branch, instead of the head of two branches, would be great (as an additional option, not instead of). Since I want to test a few different numbers of neurons/epochs in the neural network.

DavidGOrtega · 2020-04-06T07:35:05Z

@andronovhopf nice observation. We had that discussion also. That every branch might be different implementations of the same problem to be solved. Like here a DNN vs Random forest.

You can setup a different baseline and a baseline can be an specific commit sha. You can setup your baseline i.e to be HEAD~1 to compare your experiments with your previous one. And thats why the top five list came also in place to have a fast access to the same branch.

In my personal experience, to solve your problem in your DNN branch change the baseline to master/dnn (supposing its called that way) and work with branches of that branch to adjust new parameters.

DavidGOrtega · 2022-06-10T15:46:09Z

Closed this is not relevant anymore. Belongs to the CML-DVC incarnation of CML

dmpetrov mentioned this issue Mar 31, 2020

[ci skip] flag shows up in commit message when it wasn't called #30

Closed

DavidGOrtega changed the title ~~Thinking about metric display~~ Metrics of epxeriments with different tech implementation May 12, 2020

DavidGOrtega added the enhancement New feature or request label May 12, 2020

0x2b3bfa0 changed the title ~~Metrics of epxeriments with different tech implementation~~ Metrics of experiments with different tech implementation Jul 2, 2021

0x2b3bfa0 added discussion Waiting for team decision ui/ux User interface/experience and removed enhancement New feature or request labels Jul 2, 2021

DavidGOrtega closed this as completed Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics of experiments with different tech implementation #34

Metrics of experiments with different tech implementation #34

elleobrien commented Mar 26, 2020 •

edited

Loading

dmpetrov commented Mar 26, 2020

shcheklein commented Mar 26, 2020

elleobrien commented Mar 27, 2020 •

edited

Loading

dmpetrov commented Mar 27, 2020

dmpetrov commented Mar 27, 2020

elleobrien commented Mar 27, 2020

elleobrien commented Mar 27, 2020 •

edited

Loading

DavidGOrtega commented Apr 6, 2020

DavidGOrtega commented Jun 10, 2022

Metrics of experiments with different tech implementation #34

Metrics of experiments with different tech implementation #34

Comments

elleobrien commented Mar 26, 2020 • edited Loading

dmpetrov commented Mar 26, 2020

shcheklein commented Mar 26, 2020

elleobrien commented Mar 27, 2020 • edited Loading

dmpetrov commented Mar 27, 2020

dmpetrov commented Mar 27, 2020

elleobrien commented Mar 27, 2020

elleobrien commented Mar 27, 2020 • edited Loading

DavidGOrtega commented Apr 6, 2020

DavidGOrtega commented Jun 10, 2022

elleobrien commented Mar 26, 2020 •

edited

Loading

elleobrien commented Mar 27, 2020 •

edited

Loading

elleobrien commented Mar 27, 2020 •

edited

Loading