Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data validations to SDK defined cohorts #1227

Merged
merged 3 commits into from
Feb 16, 2022

Conversation

gaugup
Copy link
Contributor

@gaugup gaugup commented Feb 15, 2022

Description

The following validations need to be performed for cohort filter with
test data:-

High level validations

  1. Validate if the filter column is present in the test data.
  2. Validate if the filter column is present in the special column
    list.

"Index" Filter validations

  1. The Index filter only takes integer arguments.
  2. The Index filter doesn't take CohortFilterMethods.EXCLUDES
    filter method.

"Classification Outcome" Filter validations

  1. Validate that "Classification Outcome" filter is not configure for
    multiclass classification and regression.
  2. The "Classification Outcome" filter only contains values from set
    ClassificationOutcomes.
  3. The "Classification Outcome" filter only takes
    CohortFilterMethods.INCLUDES filter method.

"Error" Filter validations

  1. Validate that "Error" filter is not configure for
    multiclass classification and binary classification.
  2. Only integer or floating points can be configured as arguments.
  3. The CohortFilterMethods.INCLUDES and CohortFilterMethods.EXCLUDES
    filter methods cannot be configured for this filter.

"Predicted Y/True Y" Filter validations

  1. The set of classes configured in case of classification is a
    superset of the classes available in the test data.
  2. The CohortFilterMethods.INCLUDES is only allowed to be
    configured for "Predicted Y" filter in case of classification.
  3. The CohortFilterMethods.INCLUDES and CohortFilterMethods.EXCLUDES
    filter methods cannot be configured for this filter for regression.

"Dataset" Filter validations

  1. TODO:- For continuous features the allowed values that be configured
    should be within the range of minimum and maximum values available
    within the continuous feature column in the test data.
  2. For categorical features only CohortFilterMethods.INCLUDES can be
    configured.
  3. For categorical features the values allowed are a subset of the
    the values available in the categorical column in the test data.

Areas changed

npm packages changed:

  • responsibleai/causality
  • responsibleai/core-ui
  • responsibleai/counterfactuals
  • responsibleai/dataset-explorer
  • responsibleai/fairness
  • responsibleai/interpret
  • responsibleai/localization
  • responsibleai/mlchartlib
  • responsibleai/model-assessment

Python packages changed:

  • raiwidgets
  • responsibleai
  • erroranalysis
  • rai_core_flask

Tests

  • No new tests required.
  • New tests for the added feature are part of this PR.
  • I validated the changes manually.

Screenshots (if appropriate):

Documentation:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@codecov-commenter
Copy link

codecov-commenter commented Feb 15, 2022

Codecov Report

Merging #1227 (7df90a2) into main (11ee5ca) will decrease coverage by 1.20%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1227      +/-   ##
==========================================
- Coverage   67.19%   65.99%   -1.21%     
==========================================
  Files          91       91              
  Lines        4393     4473      +80     
==========================================
  Hits         2952     2952              
- Misses       1441     1521      +80     
Flag Coverage Δ
unittests 65.99% <0.00%> (-1.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
raiwidgets/raiwidgets/_cohort.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 11ee5ca...7df90a2. Read the comment docs.

@gaugup gaugup enabled auto-merge (squash) February 16, 2022 19:22
1 similar comment
@gaugup gaugup merged commit 1756613 into main Feb 16, 2022
@gaugup gaugup deleted the gaugup/AddDataValidationsCohorts branch February 16, 2022 20:38
gaugup added a commit that referenced this pull request Feb 27, 2022
* Add data validations to SDK defined cohorts

Signed-off-by: Gaurav Gupta <[email protected]>

* Fix code review comments

Signed-off-by: Gaurav Gupta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants