Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC add metadata_columns to RAIInsights #1207

Closed
wants to merge 12 commits into from

Conversation

romanlutz
Copy link
Contributor

Description

This PR includes changes to

  • add a metadata_columns argument on RAIInsights (including input validation)
  • passes only features (excluding metadata_columns using drop) to predict and predict_proba methods
  • adjust managers:
    • causal needs no change since it doesn't use the model
    • error analysis needs to be aware of metadata_columns to use them for the error analysis calculations, but at the same time avoid passing metadata_columns to the model
    • explanations and counterfactuals just need to ignore metadata_columns so we don't pass them to that manager

Note that this is a DRAFT PR so I have deliberately not included the following yet since I want to work through potential comments:

  • tests
  • notebook adjustments and additions

Areas changed

npm packages changed:

  • responsibleai/causality
  • responsibleai/core-ui
  • responsibleai/counterfactuals
  • responsibleai/dataset-explorer
  • responsibleai/fairness
  • responsibleai/interpret
  • responsibleai/localization
  • responsibleai/mlchartlib
  • responsibleai/model-assessment

Python packages changed:

  • raiwidgets
  • responsibleai
  • erroranalysis
  • rai_core_flask

Tests

  • No new tests required.
  • New tests for the added feature are part of this PR.
  • I validated the changes manually.

Screenshots (if appropriate):

Documentation:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@romanlutz romanlutz added enhancement New feature or request python Pull requests that update Python code ResponsibleAIDashboard labels Feb 9, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 9, 2022

Codecov Report

Merging #1207 (3372f57) into main (18d38dd) will decrease coverage by 34.21%.
The diff coverage is 80.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #1207       +/-   ##
===========================================
- Coverage   67.12%   32.90%   -34.22%     
===========================================
  Files          91       46       -45     
  Lines        4383     2440     -1943     
===========================================
- Hits         2942      803     -2139     
- Misses       1441     1637      +196     
Flag Coverage Δ
unittests 32.90% <80.00%> (-34.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ranalysis/erroranalysis/analyzer/error_analyzer.py 74.14% <75.00%> (-18.96%) ⬇️
...is/erroranalysis/_internal/surrogate_error_tree.py 43.72% <100.00%> (-49.40%) ⬇️
erroranalysis/erroranalysis/_internal/utils.py 20.00% <0.00%> (-73.34%) ⬇️
...ranalysis/erroranalysis/_internal/matrix_filter.py 48.33% <0.00%> (-47.51%) ⬇️
erroranalysis/erroranalysis/report/error_report.py 47.94% <0.00%> (-41.10%) ⬇️
erroranalysis/erroranalysis/_internal/metrics.py 51.72% <0.00%> (-34.49%) ⬇️
...ranalysis/erroranalysis/_internal/cohort_filter.py 72.58% <0.00%> (-16.94%) ⬇️
responsibleai/responsibleai/_interfaces.py
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 18d38dd...3372f57. Read the comment docs.

Copy link
Contributor

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the basic idea, but should we just do one 'drop' in the top level RAIInsights object and then reference those?

erroranalysis/erroranalysis/analyzer/error_analyzer.py Outdated Show resolved Hide resolved
:param metadata_columns: The set of columns that are not passed
to the model or explainers. These columns can be used for
other analyses.
:type metadata_columns: list[str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional[list[str]] if None is valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we haven't done that on the others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually taking the unrelated doc adjustments out of this PR into a separate one. Perhaps I can do that there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1214 !

Also, the optional[list[str]] annotations are only used for type annotations on the args in code, not in the docstring, right? I may be wrong...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the annotations, but there will probably be a few merge conflicts to resolve once #1214 is merged.

@@ -20,6 +20,7 @@ class Dataset:
class_names: List[str]
categorical_features: List[str]
target_column: str
metadata_columns: List[List]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps a place where is should be Optional[List[str]] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inspired me to create #1214 . Once that's merged I'll add the same annotations here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's List[str] (updated just now). It doesn't need to be optional for this. It can be set as []

@@ -50,10 +51,12 @@ class RAIInsights(object):
"""

def __init__(self, model, train, test, target_column,
task_type, categorical_features=None, classes=None,
task_type, categorical_features=None,
metadata_columns=None, classes=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this parameter in the end after maximum_rows_for_test for back compat reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, I felt like it makes most sense right after categorical_features, since it's also a list of column names. Realizing that these are positional arguments actually makes me want to make all but model, train, and test keyword-only:

func(self, model, train, test, *, target_column, task_type, categorical_features=None, ...)

In that case, the positioning after * is irrelevant since you have to call it with keyword

obj.func(model, train, test, target_column="y", categorical_features=["gender"], metadata_columns=["gender", "age"], task_type="classification", ...)

To me, it's pretty clear that there's no benefit at all to having them be positional. We've done something like this in Fairlearn a while ago: https://github.com/fairlearn/fairlearn/blob/862263f4352e23da088ae7a638b88b56580e5230/fairlearn/metrics/_metric_frame.py#L269
Since it's not a new class, but one that already exists, the situation is somewhat more complicated. Making any change might break code that would currently run fine. If I remember correctly, we ended up adding a warning that arguments will be keyword-only starting in the next released version and then made the change one version later. For this particular PR I won't get around moving it to the end it seems.

@xuke444 @riedgar-ms @gaugup @imatiach-msft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favour of requiring keywords for optional arguments. Once there are more than a couple, it makes much more sense that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only just very recently released this RAIInsights, so I think it's fine to change as long as we have a really really really good reason to. I'm not quite sure if this one meets the bar.

responsibleai/responsibleai/rai_insights/rai_insights.py Outdated Show resolved Hide resolved
@romanlutz
Copy link
Contributor Author

should this be analysis instead of analyses?

@gaugup No, it's deliberately chosen to be analyses (plural) since we're already doing multiple (disaggregated analysis, error analysis) using these columns.

@imatiach-msft
Copy link
Contributor

@romanlutz not sure why these suddenly got included but we shouldn't be adding any datasets to git/source:
image
datasets make git slow, especially cloning, and git doesn't render/work with them well

@romanlutz
Copy link
Contributor Author

I'm closing this as we've decided to move in a somewhat different direction led by @gaugup. The RFC has therefore served its purpose well 🙂

@romanlutz romanlutz closed this Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python Pull requests that update Python code ResponsibleAIDashboard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants