-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC add metadata_columns to RAIInsights #1207
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1207 +/- ##
===========================================
- Coverage 67.12% 32.90% -34.22%
===========================================
Files 91 46 -45
Lines 4383 2440 -1943
===========================================
- Hits 2942 803 -2139
- Misses 1441 1637 +196
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the basic idea, but should we just do one 'drop' in the top level RAIInsights
object and then reference those?
:param metadata_columns: The set of columns that are not passed | ||
to the model or explainers. These columns can be used for | ||
other analyses. | ||
:type metadata_columns: list[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional[list[str]] if None
is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we haven't done that on the others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually taking the unrelated doc adjustments out of this PR into a separate one. Perhaps I can do that there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1214 !
Also, the optional[list[str]] annotations are only used for type annotations on the args in code, not in the docstring, right? I may be wrong...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the annotations, but there will probably be a few merge conflicts to resolve once #1214 is merged.
@@ -20,6 +20,7 @@ class Dataset: | |||
class_names: List[str] | |||
categorical_features: List[str] | |||
target_column: str | |||
metadata_columns: List[List] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps a place where is should be Optional[List[str]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This inspired me to create #1214 . Once that's merged I'll add the same annotations here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's List[str]
(updated just now). It doesn't need to be optional for this. It can be set as []
…olbox into romanlutz/exclude_columns
@@ -50,10 +51,12 @@ class RAIInsights(object): | |||
""" | |||
|
|||
def __init__(self, model, train, test, target_column, | |||
task_type, categorical_features=None, classes=None, | |||
task_type, categorical_features=None, | |||
metadata_columns=None, classes=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this parameter in the end after maximum_rows_for_test for back compat reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically, I felt like it makes most sense right after categorical_features
, since it's also a list of column names. Realizing that these are positional arguments actually makes me want to make all but model, train, and test keyword-only:
func(self, model, train, test, *, target_column, task_type, categorical_features=None, ...)
In that case, the positioning after *
is irrelevant since you have to call it with keyword
obj.func(model, train, test, target_column="y", categorical_features=["gender"], metadata_columns=["gender", "age"], task_type="classification", ...)
To me, it's pretty clear that there's no benefit at all to having them be positional. We've done something like this in Fairlearn a while ago: https://github.com/fairlearn/fairlearn/blob/862263f4352e23da088ae7a638b88b56580e5230/fairlearn/metrics/_metric_frame.py#L269
Since it's not a new class, but one that already exists, the situation is somewhat more complicated. Making any change might break code that would currently run fine. If I remember correctly, we ended up adding a warning that arguments will be keyword-only starting in the next released version and then made the change one version later. For this particular PR I won't get around moving it to the end it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favour of requiring keywords for optional arguments. Once there are more than a couple, it makes much more sense that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only just very recently released this RAIInsights, so I think it's fine to change as long as we have a really really really good reason to. I'm not quite sure if this one meets the bar.
@gaugup No, it's deliberately chosen to be analyses (plural) since we're already doing multiple (disaggregated analysis, error analysis) using these columns. |
@romanlutz not sure why these suddenly got included but we shouldn't be adding any datasets to git/source: |
I'm closing this as we've decided to move in a somewhat different direction led by @gaugup. The RFC has therefore served its purpose well 🙂 |
Description
This PR includes changes to
metadata_columns
argument onRAIInsights
(including input validation)metadata_columns
usingdrop
) topredict
andpredict_proba
methodsmetadata_columns
to use them for the error analysis calculations, but at the same time avoid passingmetadata_columns
to the modelmetadata_columns
so we don't pass them to that managerNote that this is a DRAFT PR so I have deliberately not included the following yet since I want to work through potential comments:
Areas changed
npm packages changed:
Python packages changed:
Tests
Screenshots (if appropriate):
Documentation: