RFC add metadata_columns to RAIInsights #1207

romanlutz · 2022-02-09T03:16:25Z

Description

This PR includes changes to

add a metadata_columns argument on RAIInsights (including input validation)
passes only features (excluding metadata_columns using drop) to predict and predict_proba methods
adjust managers:
- causal needs no change since it doesn't use the model
- error analysis needs to be aware of metadata_columns to use them for the error analysis calculations, but at the same time avoid passing metadata_columns to the model
- explanations and counterfactuals just need to ignore metadata_columns so we don't pass them to that manager

Note that this is a DRAFT PR so I have deliberately not included the following yet since I want to work through potential comments:

tests
notebook adjustments and additions

Areas changed

npm packages changed:

Python packages changed:

raiwidgets
responsibleai
erroranalysis
rai_core_flask

Tests

No new tests required.
New tests for the added feature are part of this PR.
I validated the changes manually.

Screenshots (if appropriate):

Documentation:

My change requires a change to the documentation.
I have updated the documentation accordingly.

codecov-commenter · 2022-02-09T03:17:35Z

Codecov Report

Merging #1207 (3372f57) into main (18d38dd) will decrease coverage by 34.21%.
The diff coverage is 80.00%.

@@             Coverage Diff             @@
##             main    #1207       +/-   ##
===========================================
- Coverage   67.12%   32.90%   -34.22%     
===========================================
  Files          91       46       -45     
  Lines        4383     2440     -1943     
===========================================
- Hits         2942      803     -2139     
- Misses       1441     1637      +196

Flag	Coverage Δ
unittests	`32.90% <80.00%> (-34.22%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ranalysis/erroranalysis/analyzer/error_analyzer.py	`74.14% <75.00%> (-18.96%)`	⬇️
...is/erroranalysis/_internal/surrogate_error_tree.py	`43.72% <100.00%> (-49.40%)`	⬇️
erroranalysis/erroranalysis/_internal/utils.py	`20.00% <0.00%> (-73.34%)`	⬇️
...ranalysis/erroranalysis/_internal/matrix_filter.py	`48.33% <0.00%> (-47.51%)`	⬇️
erroranalysis/erroranalysis/report/error_report.py	`47.94% <0.00%> (-41.10%)`	⬇️
erroranalysis/erroranalysis/_internal/metrics.py	`51.72% <0.00%> (-34.49%)`	⬇️
...ranalysis/erroranalysis/_internal/cohort_filter.py	`72.58% <0.00%> (-16.94%)`	⬇️
responsibleai/responsibleai/_interfaces.py
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 18d38dd...3372f57. Read the comment docs.

erroranalysis/erroranalysis/analyzer/error_analyzer.py

responsibleai/responsibleai/managers/error_analysis_manager.py

responsibleai/responsibleai/rai_insights/rai_insights.py

riedgar-ms

I like the basic idea, but should we just do one 'drop' in the top level RAIInsights object and then reference those?

erroranalysis/erroranalysis/analyzer/error_analyzer.py

riedgar-ms · 2022-02-09T16:45:06Z

erroranalysis/erroranalysis/analyzer/error_analyzer.py

+    :param metadata_columns: The set of columns that are not passed
+        to the model or explainers. These columns can be used for
+        other analyses.
+    :type metadata_columns: list[str]


optional[list[str]] if None is valid?

Although we haven't done that on the others

I'm actually taking the unrelated doc adjustments out of this PR into a separate one. Perhaps I can do that there!

#1214 !

Also, the optional[list[str]] annotations are only used for type annotations on the args in code, not in the docstring, right? I may be wrong...

added the annotations, but there will probably be a few merge conflicts to resolve once #1214 is merged.

riedgar-ms · 2022-02-09T16:46:00Z

responsibleai/responsibleai/_interfaces.py

@@ -20,6 +20,7 @@ class Dataset:
    class_names: List[str]
    categorical_features: List[str]
    target_column: str
+    metadata_columns: List[List]


This is perhaps a place where is should be Optional[List[str]] ?

This inspired me to create #1214 . Once that's merged I'll add the same annotations here, too.

Actually it's List[str] (updated just now). It doesn't need to be optional for this. It can be set as []

responsibleai/responsibleai/managers/counterfactual_manager.py

responsibleai/responsibleai/managers/error_analysis_manager.py

…olbox into romanlutz/exclude_columns

gaugup · 2022-02-10T11:25:17Z

responsibleai/responsibleai/rai_insights/rai_insights.py

@@ -50,10 +51,12 @@ class RAIInsights(object):
    """

    def __init__(self, model, train, test, target_column,
-                 task_type, categorical_features=None, classes=None,
+                 task_type, categorical_features=None,
+                 metadata_columns=None, classes=None,


Can you add this parameter in the end after maximum_rows_for_test for back compat reasons?

Logically, I felt like it makes most sense right after categorical_features, since it's also a list of column names. Realizing that these are positional arguments actually makes me want to make all but model, train, and test keyword-only:

func(self, model, train, test, *, target_column, task_type, categorical_features=None, ...)

In that case, the positioning after * is irrelevant since you have to call it with keyword

obj.func(model, train, test, target_column="y", categorical_features=["gender"], metadata_columns=["gender", "age"], task_type="classification", ...)

To me, it's pretty clear that there's no benefit at all to having them be positional. We've done something like this in Fairlearn a while ago: https://github.com/fairlearn/fairlearn/blob/862263f4352e23da088ae7a638b88b56580e5230/fairlearn/metrics/_metric_frame.py#L269
Since it's not a new class, but one that already exists, the situation is somewhat more complicated. Making any change might break code that would currently run fine. If I remember correctly, we ended up adding a warning that arguments will be keyword-only starting in the next released version and then made the change one version later. For this particular PR I won't get around moving it to the end it seems.

@xuke444 @riedgar-ms @gaugup @imatiach-msft

I would be in favour of requiring keywords for optional arguments. Once there are more than a couple, it makes much more sense that way.

we only just very recently released this RAIInsights, so I think it's fine to change as long as we have a really really really good reason to. I'm not quite sure if this one meets the bar.

responsibleai/responsibleai/rai_insights/rai_insights.py

romanlutz · 2022-02-10T13:45:43Z

should this be analysis instead of analyses?

@gaugup No, it's deliberately chosen to be analyses (plural) since we're already doing multiple (disaggregated analysis, error analysis) using these columns.

…olbox into romanlutz/exclude_columns

imatiach-msft · 2022-02-11T04:57:48Z

@romanlutz not sure why these suddenly got included but we shouldn't be adding any datasets to git/source:

datasets make git slow, especially cloning, and git doesn't render/work with them well

romanlutz · 2022-06-24T02:36:49Z

I'm closing this as we've decided to move in a somewhat different direction led by @gaugup. The RFC has therefore served its purpose well 🙂

romanlutz added 2 commits February 4, 2022 18:52

add metadata_columns to responsibleai package

c1fca65

adjustments to make e2e flow work

954dcd5

romanlutz added enhancement New feature or request python Pull requests that update Python code ResponsibleAIDashboard labels Feb 9, 2022

romanlutz requested review from imatiach-msft, riedgar-ms, xuke444 and gaugup February 9, 2022 03:16

gaugup reviewed Feb 9, 2022

View reviewed changes

erroranalysis/erroranalysis/analyzer/error_analyzer.py Outdated Show resolved Hide resolved

responsibleai/responsibleai/managers/error_analysis_manager.py Show resolved Hide resolved

responsibleai/responsibleai/rai_insights/rai_insights.py Outdated Show resolved Hide resolved

riedgar-ms reviewed Feb 9, 2022

View reviewed changes

romanlutz added 3 commits February 9, 2022 21:24

use views instead of drop

4c70aed

Merge branch 'main' of https://github.com/microsoft/responsible-ai-to…

f4f4227

…olbox into romanlutz/exclude_columns

flake8

313f309

gaugup reviewed Feb 10, 2022

View reviewed changes

romanlutz added 7 commits February 10, 2022 16:57

Merge branch 'main' of https://github.com/microsoft/responsible-ai-to…

fee6d3a

…olbox into romanlutz/exclude_columns

extend docstring description of metadata_columns

fe2612b

put metadata_columns last in the arg list, add type annotation

df8f69d

fix type of metadata_columns on _interfaces.py

a45430a

Merge branch 'main' of https://github.com/microsoft/responsible-ai-to…

53752e7

…olbox into romanlutz/exclude_columns

flake8, isort

6b58f84

Merge branch 'main' of https://github.com/microsoft/responsible-ai-to…

3372f57

…olbox into romanlutz/exclude_columns

romanlutz closed this Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC add metadata_columns to RAIInsights #1207

RFC add metadata_columns to RAIInsights #1207

romanlutz commented Feb 9, 2022

codecov-commenter commented Feb 9, 2022 •

edited

Loading

riedgar-ms left a comment

riedgar-ms Feb 9, 2022

riedgar-ms Feb 9, 2022

romanlutz Feb 9, 2022

romanlutz Feb 10, 2022

romanlutz Feb 10, 2022

riedgar-ms Feb 9, 2022

romanlutz Feb 10, 2022

romanlutz Feb 10, 2022

gaugup Feb 10, 2022

romanlutz Feb 10, 2022

riedgar-ms Feb 10, 2022

imatiach-msft Feb 10, 2022

romanlutz commented Feb 10, 2022

imatiach-msft commented Feb 11, 2022

romanlutz commented Jun 24, 2022

RFC add metadata_columns to RAIInsights #1207

RFC add metadata_columns to RAIInsights #1207

Conversation

romanlutz commented Feb 9, 2022

Description

Areas changed

Tests

Screenshots (if appropriate):

Documentation:

codecov-commenter commented Feb 9, 2022 • edited Loading

Codecov Report

riedgar-ms left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanlutz commented Feb 10, 2022

imatiach-msft commented Feb 11, 2022

romanlutz commented Jun 24, 2022

codecov-commenter commented Feb 9, 2022 •

edited

Loading