Write new Experiment or ExperimentAnalysis code module #371

mdekstrand · 2024-04-10T20:45:44Z

Right now RecListAnalysis is good but limited — only computes per-user metrics.

It would help standardization of evaluation procedures if we had a more coherent "analyze" (and maybe "run") tool for experiments. The first version, of course, would just be for analysis.

Specify experiment axes instead of inferring them?
Support global metrics
Specify list lengths as analysis parameter
Support metrics with additional data (novelty, etc.)
Clean up metric interface design
Support analysis (sig tests, CIs, distributions, etc.)
Support results in DuckDB?

This ticket is really probably its own epic.

The text was updated successfully, but these errors were encountered:

mdekstrand added the evaluation label Apr 10, 2024

mdekstrand mentioned this issue Jul 24, 2024

Clean up Pandas warnings in eval code #447

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write new Experiment or ExperimentAnalysis code module #371

Write new Experiment or ExperimentAnalysis code module #371

mdekstrand commented Apr 10, 2024 •

edited

Loading

Write new Experiment or ExperimentAnalysis code module #371

Write new Experiment or ExperimentAnalysis code module #371

Comments

mdekstrand commented Apr 10, 2024 • edited Loading

mdekstrand commented Apr 10, 2024 •

edited

Loading