Enhancement for the tabular validator. #291

ArlindKadra · 2021-10-01T14:07:51Z

Refactoring the tabular validator parts.

autoPyTorch/data/base_feature_validator.py

autoPyTorch/data/tabular_feature_validator.py

nabenabe0928

Hi thanks for the PR.
I checked the implementation and suggested some changes.

autoPyTorch/pipeline/components/setup/network_embedding/base_network_embedding.py

autoPyTorch/data/base_feature_validator.py

autoPyTorch/data/tabular_feature_validator.py

nabenabe0928

I left just quick comments!

autoPyTorch/data/tabular_feature_validator.py

nabenabe0928

@ArlindKadra Thanks for the changes. I added some comments as suggestions to help better understanding of the codes:)

autoPyTorch/data/tabular_feature_validator.py

ArlindKadra · 2021-10-06T18:21:34Z

@ravinkohli Needs to add a fix for dropping completely empty categorical columns or the code will fail at the following line:

Auto-PyTorch/autoPyTorch/data/tabular_feature_validator.py

Line 505 in a1ed883

first_value = X[column].dropna().values[0]

nabenabe0928

Hi, I found some bugs, so I will send Request changes.
There are two points that I addressed:

We cannot judge whether a whole column can be casted to number by just looking only the first value
A simple change for the get_unused_category_symbol may suffer from overflow issues

autoPyTorch/data/tabular_feature_validator.py

ArlindKadra · 2021-10-08T08:38:26Z

The code for deleting null columns will fail, since on the first call to fit you will have a list of nan_columns, however, on the call to transform for train, you will not have those columns there anymore and you will make the list empty. Then, while transforming the test dataset, you will get an error because of the DataFrames with different columns.

Actually what is happening now is that we store the column order after we drop the all nan columns. So the column order is already only set after we have dropped the columns in fit. Therefore I don't think it will fail.

This is not an assumption, I noticed it failing and I debugged the trace.

ArlindKadra · 2021-10-08T08:41:26Z

The code for deleting null columns will fail, since on the first call to fit you will have a list of nan_columns, however, on the call to transform for train, you will not have those columns there anymore and you will make the list empty. Then, while transforming the test dataset, you will get an error because of the DataFrames with different columns.

Actually what is happening now is that we store the column order after we drop the all nan columns. So the column order is already only set after we have dropped the columns in fit. Therefore I don't think it will fail.

To be clear, you call fit with a train set, you drop the columns, you store the dropped column names.
You call transform with the train set, you have set columns but you already dropped the columns so there is no subset, you go to the else, you initialize the list of null column names to none, and you drop no columns.
You call transform with the test set, you now have an empty null column list and you have two different sized DataFrames. Is that more clear?

ravinkohli · 2021-10-08T09:03:53Z

The code for deleting null columns will fail, since on the first call to fit you will have a list of nan_columns, however, on the call to transform for train, you will not have those columns there anymore and you will make the list empty. Then, while transforming the test dataset, you will get an error because of the DataFrames with different columns.

Actually what is happening now is that we store the column order after we drop the all nan columns. So the column order is already only set after we have dropped the columns in fit. Therefore I don't think it will fail.

To be clear, you call fit with a train set, you drop the columns, you store the dropped column names. You call transform with the train set, you have set columns but you already dropped the columns so there is no subset, you go to the else, you initialize the list of null column names to none, and you drop no columns. You call transform with the test set, you now have an empty null column list and you have two different sized DataFrames. Is that more clear?

yes now I get it. I'll fix it. Thanks for pointing out.

…ical evaluator

nabenabe0928

Hi, thanks for the commits. I was a bit surprised because the files changed a lot from the last time I saw!
I put some comments and modifications, so please address them:)

test/test_data/test_feature_validator.py

nabenabe0928 · 2021-10-08T13:59:10Z

autoPyTorch/data/base_target_validator.py

@@ -12,8 +12,8 @@
 from autoPyTorch.utils.logging_ import PicklableClientLogger


-SUPPORTED_TARGET_TYPES = typing.Union[
-    typing.List,
+SUPPORTED_TARGET_TYPES = Union[


AutoPep8 rule

Suggested change

SUPPORTED_TARGET_TYPES = Union[

SupportedTargetTypes = Union[

Lets keep this a part of a separate PR later.

autoPyTorch/data/tabular_feature_validator.py

nabenabe0928 · 2021-10-08T14:06:06Z

autoPyTorch/data/tabular_feature_validator.py

            categorical_columns, numerical_columns, feat_type = self._get_columns_info(X)

            self.enc_columns = categorical_columns
-            if len(categorical_columns) >= 0:
-                X = self.impute_nan_in_categories(X)


Where are we imputing now?

we are using a sklearn imputer also for the categorical columns

autoPyTorch/data/tabular_feature_validator.py

Co-authored-by: nabenabe0928 <[email protected]>

* preprocess inside data validator * add time debug statements * Add fixes for categorical data * add fit_ensemble * add arlind fix for swa and se * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * shuffle = False * Shake Shake updates (#287) * To test locally * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * To test locally * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * To test locally * shuffle = False * To test locally * updates to search space * updates to search space * update branch with search space * undo search space update * fix bug in shake shake flag * limit to shake-even * restrict to even even * Add even even and others for shake-drop also * fix bug in passing alpha beta method * restrict to only even even * fix silly bug: * remove imputer and ordinal encoder for categorical transformer in feature validator * Address comments from shuhei * fix issues with ensemble fitting post hoc * Address comments on the PR * Fix flake and mypy errors * Address comments from PR #286 * fix bug in embedding * Update autoPyTorch/api/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py Co-authored-by: nabenabe0928 <[email protected]> * Address comments from shuhei * adress comments from shuhei * fix flake and mypy * Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * increase threads_per_worker * fix bug in rowcutmix * Enhancement for the tabular validator. (#291) * Initial try at an enhancement for the tabular validator * Adding a few type annotations * Fixing bugs in implementation * Adding wrongly deleted code part during rebase * Fix bug in _get_args * Fix bug in _get_args * Addressing Shuhei's comments * Address Shuhei's comments * Refactoring code * Refactoring code * Typos fix and additional comments * Replace nan in categoricals with simple imputer * Remove unused function * add comment * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Adding unit test for only nall columns in the tabular feature categorical evaluator * fix bug in remove all nan columns * Bug fix for making tests run by arlind * fix flake errors in feature validator * made typing code uniform * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * address comments from shuhei * address comments from shuhei (2) Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * resolve code issues with new versions * Address comments from shuhei * make run_traditional_ml function * implement suggestion from shuhei and fix bug in rowcutmixtrainer * fix return type docstring * add better documentation and fix bug in shake_drop_get_bl * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * add test for comparator and other improvements based on PR comments * fix bug in test * [fix] Fix the condition in the raising error of all_nan_columns * [refactor] Unite name conventions of numpy array and pandas dataframe * [doc] Add the description about the tabular feature transformation * [doc] Add the description of the tabular feature transformation * address comments from arlind * address comments from arlind * change to as_tensor and address comments from arlind * correct description for functions in data module Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Arlind Kadra <[email protected]> Co-authored-by: nabenabe0928 <[email protected]>

* Update implementation * Coding style fixes * Implementation update * Style fix * Turn weighted loss into a constant again, implementation update * Cocktail branch inconsistencies (#275) * To nemo * Revert change in T_curr as results conclusively prove it should be 0 * Revert cutmix change after data from run * Final conclusion after results * FIX bug in shake alpha beta * Updated if is_training condition for shake drop * Remove temp fix in row cutmic * Cocktail fixes time debug (#286) * preprocess inside data validator * add time debug statements * Add fixes for categorical data * add fit_ensemble * add arlind fix for swa and se * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * shuffle = False * Shake Shake updates (#287) * To test locally * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * To test locally * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * To test locally * shuffle = False * To test locally * updates to search space * updates to search space * update branch with search space * undo search space update * fix bug in shake shake flag * limit to shake-even * restrict to even even * Add even even and others for shake-drop also * fix bug in passing alpha beta method * restrict to only even even * fix silly bug: * remove imputer and ordinal encoder for categorical transformer in feature validator * Address comments from shuhei * fix issues with ensemble fitting post hoc * Address comments on the PR * Fix flake and mypy errors * Address comments from PR #286 * fix bug in embedding * Update autoPyTorch/api/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py Co-authored-by: nabenabe0928 <[email protected]> * Address comments from shuhei * adress comments from shuhei * fix flake and mypy * Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * increase threads_per_worker * fix bug in rowcutmix * Enhancement for the tabular validator. (#291) * Initial try at an enhancement for the tabular validator * Adding a few type annotations * Fixing bugs in implementation * Adding wrongly deleted code part during rebase * Fix bug in _get_args * Fix bug in _get_args * Addressing Shuhei's comments * Address Shuhei's comments * Refactoring code * Refactoring code * Typos fix and additional comments * Replace nan in categoricals with simple imputer * Remove unused function * add comment * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Adding unit test for only nall columns in the tabular feature categorical evaluator * fix bug in remove all nan columns * Bug fix for making tests run by arlind * fix flake errors in feature validator * made typing code uniform * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * address comments from shuhei * address comments from shuhei (2) Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * resolve code issues with new versions * Address comments from shuhei * make run_traditional_ml function * implement suggestion from shuhei and fix bug in rowcutmixtrainer * fix return type docstring * add better documentation and fix bug in shake_drop_get_bl * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * add test for comparator and other improvements based on PR comments * fix bug in test * [fix] Fix the condition in the raising error of all_nan_columns * [refactor] Unite name conventions of numpy array and pandas dataframe * [doc] Add the description about the tabular feature transformation * [doc] Add the description of the tabular feature transformation * address comments from arlind * address comments from arlind * change to as_tensor and address comments from arlind * correct description for functions in data module Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Arlind Kadra <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Addressing Shuhei's comments * flake8 problems fix * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Add indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Add line indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Validate if there is a column transformer since for sparse matrices we will not have one. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/utils/implementations.py Delete uncommented line. Co-authored-by: Ravin Kohli <[email protected]> * Allow the number of threads to be given by the user * Removing unnecessary argument and refactoring the attribute. * Addressing Ravin's comments * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Updating the function documentation according to the agreed style. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Providing information on the wrong method provided for shake-shake regularization. Co-authored-by: nabenabe0928 <[email protected]> * add todo for backend and accept changes from shuhei * Addressing Shuhei's and Ravin's comments * Addressing Shuhei's and Ravin's comments, bug fix * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving code readibility. Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving consistency. Co-authored-by: nabenabe0928 <[email protected]> * bug fix Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Ravin Kohli <[email protected]>

* Update implementation * Coding style fixes * Implementation update * Style fix * Turn weighted loss into a constant again, implementation update * Cocktail branch inconsistencies (automl#275) * To nemo * Revert change in T_curr as results conclusively prove it should be 0 * Revert cutmix change after data from run * Final conclusion after results * FIX bug in shake alpha beta * Updated if is_training condition for shake drop * Remove temp fix in row cutmic * Cocktail fixes time debug (automl#286) * preprocess inside data validator * add time debug statements * Add fixes for categorical data * add fit_ensemble * add arlind fix for swa and se * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * shuffle = False * Shake Shake updates (automl#287) * To test locally * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * To test locally * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * To test locally * shuffle = False * To test locally * updates to search space * updates to search space * update branch with search space * undo search space update * fix bug in shake shake flag * limit to shake-even * restrict to even even * Add even even and others for shake-drop also * fix bug in passing alpha beta method * restrict to only even even * fix silly bug: * remove imputer and ordinal encoder for categorical transformer in feature validator * Address comments from shuhei * fix issues with ensemble fitting post hoc * Address comments on the PR * Fix flake and mypy errors * Address comments from PR automl#286 * fix bug in embedding * Update autoPyTorch/api/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py Co-authored-by: nabenabe0928 <[email protected]> * Address comments from shuhei * adress comments from shuhei * fix flake and mypy * Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * increase threads_per_worker * fix bug in rowcutmix * Enhancement for the tabular validator. (automl#291) * Initial try at an enhancement for the tabular validator * Adding a few type annotations * Fixing bugs in implementation * Adding wrongly deleted code part during rebase * Fix bug in _get_args * Fix bug in _get_args * Addressing Shuhei's comments * Address Shuhei's comments * Refactoring code * Refactoring code * Typos fix and additional comments * Replace nan in categoricals with simple imputer * Remove unused function * add comment * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Adding unit test for only nall columns in the tabular feature categorical evaluator * fix bug in remove all nan columns * Bug fix for making tests run by arlind * fix flake errors in feature validator * made typing code uniform * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * address comments from shuhei * address comments from shuhei (2) Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * resolve code issues with new versions * Address comments from shuhei * make run_traditional_ml function * implement suggestion from shuhei and fix bug in rowcutmixtrainer * fix return type docstring * add better documentation and fix bug in shake_drop_get_bl * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * add test for comparator and other improvements based on PR comments * fix bug in test * [fix] Fix the condition in the raising error of all_nan_columns * [refactor] Unite name conventions of numpy array and pandas dataframe * [doc] Add the description about the tabular feature transformation * [doc] Add the description of the tabular feature transformation * address comments from arlind * address comments from arlind * change to as_tensor and address comments from arlind * correct description for functions in data module Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Arlind Kadra <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Addressing Shuhei's comments * flake8 problems fix * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Add indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Add line indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Validate if there is a column transformer since for sparse matrices we will not have one. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/utils/implementations.py Delete uncommented line. Co-authored-by: Ravin Kohli <[email protected]> * Allow the number of threads to be given by the user * Removing unnecessary argument and refactoring the attribute. * Addressing Ravin's comments * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Updating the function documentation according to the agreed style. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Providing information on the wrong method provided for shake-shake regularization. Co-authored-by: nabenabe0928 <[email protected]> * add todo for backend and accept changes from shuhei * Addressing Shuhei's and Ravin's comments * Addressing Shuhei's and Ravin's comments, bug fix * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving code readibility. Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving consistency. Co-authored-by: nabenabe0928 <[email protected]> * bug fix Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Ravin Kohli <[email protected]>

* Update implementation * Coding style fixes * Implementation update * Style fix * Turn weighted loss into a constant again, implementation update * Cocktail branch inconsistencies (#275) * To nemo * Revert change in T_curr as results conclusively prove it should be 0 * Revert cutmix change after data from run * Final conclusion after results * FIX bug in shake alpha beta * Updated if is_training condition for shake drop * Remove temp fix in row cutmic * Cocktail fixes time debug (#286) * preprocess inside data validator * add time debug statements * Add fixes for categorical data * add fit_ensemble * add arlind fix for swa and se * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * shuffle = False * Shake Shake updates (#287) * To test locally * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * To test locally * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * To test locally * shuffle = False * To test locally * updates to search space * updates to search space * update branch with search space * undo search space update * fix bug in shake shake flag * limit to shake-even * restrict to even even * Add even even and others for shake-drop also * fix bug in passing alpha beta method * restrict to only even even * fix silly bug: * remove imputer and ordinal encoder for categorical transformer in feature validator * Address comments from shuhei * fix issues with ensemble fitting post hoc * Address comments on the PR * Fix flake and mypy errors * Address comments from PR #286 * fix bug in embedding * Update autoPyTorch/api/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py Co-authored-by: nabenabe0928 <[email protected]> * Address comments from shuhei * adress comments from shuhei * fix flake and mypy * Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * increase threads_per_worker * fix bug in rowcutmix * Enhancement for the tabular validator. (#291) * Initial try at an enhancement for the tabular validator * Adding a few type annotations * Fixing bugs in implementation * Adding wrongly deleted code part during rebase * Fix bug in _get_args * Fix bug in _get_args * Addressing Shuhei's comments * Address Shuhei's comments * Refactoring code * Refactoring code * Typos fix and additional comments * Replace nan in categoricals with simple imputer * Remove unused function * add comment * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Adding unit test for only nall columns in the tabular feature categorical evaluator * fix bug in remove all nan columns * Bug fix for making tests run by arlind * fix flake errors in feature validator * made typing code uniform * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * address comments from shuhei * address comments from shuhei (2) Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * resolve code issues with new versions * Address comments from shuhei * make run_traditional_ml function * implement suggestion from shuhei and fix bug in rowcutmixtrainer * fix return type docstring * add better documentation and fix bug in shake_drop_get_bl * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * add test for comparator and other improvements based on PR comments * fix bug in test * [fix] Fix the condition in the raising error of all_nan_columns * [refactor] Unite name conventions of numpy array and pandas dataframe * [doc] Add the description about the tabular feature transformation * [doc] Add the description of the tabular feature transformation * address comments from arlind * address comments from arlind * change to as_tensor and address comments from arlind * correct description for functions in data module Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Arlind Kadra <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Addressing Shuhei's comments * flake8 problems fix * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Add indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Add line indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Validate if there is a column transformer since for sparse matrices we will not have one. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/utils/implementations.py Delete uncommented line. Co-authored-by: Ravin Kohli <[email protected]> * Allow the number of threads to be given by the user * Removing unnecessary argument and refactoring the attribute. * Addressing Ravin's comments * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Updating the function documentation according to the agreed style. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Providing information on the wrong method provided for shake-shake regularization. Co-authored-by: nabenabe0928 <[email protected]> * add todo for backend and accept changes from shuhei * Addressing Shuhei's and Ravin's comments * Addressing Shuhei's and Ravin's comments, bug fix * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving code readibility. Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving consistency. Co-authored-by: nabenabe0928 <[email protected]> * bug fix Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Ravin Kohli <[email protected]>

ArlindKadra requested a review from ravinkohli October 1, 2021 14:07

ArlindKadra changed the title ~~Initial try~~ Enhancement for the tabular validator. Oct 1, 2021

ArlindKadra requested a review from nabenabe0928 October 1, 2021 14:08

ArlindKadra added 3 commits October 1, 2021 17:13

Initial try at an enhancement for the tabular validator

359b4c9

Adding a few type annotations

65e8ffb

Fixing bugs in implementation

217c38d

ArlindKadra force-pushed the tabular_validator_enhancement branch from 99b2efe to 217c38d Compare October 1, 2021 15:17

ravinkohli reviewed Oct 1, 2021

View reviewed changes

autoPyTorch/data/base_feature_validator.py Show resolved Hide resolved

ravinkohli reviewed Oct 1, 2021

View reviewed changes

autoPyTorch/data/tabular_feature_validator.py Outdated Show resolved Hide resolved

ravinkohli reviewed Oct 1, 2021

View reviewed changes

autoPyTorch/data/tabular_feature_validator.py Outdated Show resolved Hide resolved

ravinkohli reviewed Oct 1, 2021

View reviewed changes

autoPyTorch/data/tabular_feature_validator.py Show resolved Hide resolved

ravinkohli reviewed Oct 1, 2021

View reviewed changes

autoPyTorch/data/tabular_feature_validator.py Show resolved Hide resolved

ArlindKadra and others added 3 commits October 1, 2021 18:14

Adding wrongly deleted code part during rebase

f7dd8fe

Fix bug in _get_args

92bd535

Fix bug in _get_args

5f672b5

nabenabe0928 requested changes Oct 3, 2021

View reviewed changes

Addressing Shuhei's comments

223c09e

nabenabe0928 reviewed Oct 3, 2021

View reviewed changes

autoPyTorch/data/tabular_feature_validator.py Outdated Show resolved Hide resolved

autoPyTorch/data/tabular_feature_validator.py Outdated Show resolved Hide resolved

ArlindKadra added 3 commits October 4, 2021 00:12

Address Shuhei's comments

a1ed883

Refactoring code

f585310

Refactoring code

f298c46

ArlindKadra requested a review from nabenabe0928 October 6, 2021 17:33

nabenabe0928 reviewed Oct 6, 2021

View reviewed changes

Typos fix and additional comments

03bef16

ArlindKadra requested a review from nabenabe0928 October 6, 2021 18:24

nabenabe0928 requested changes Oct 7, 2021

View reviewed changes

Replace nan in categoricals with simple imputer

a7d01f1

ArlindKadra commented Oct 7, 2021

View reviewed changes

autoPyTorch/data/tabular_feature_validator.py Outdated Show resolved Hide resolved

autoPyTorch/data/tabular_feature_validator.py Outdated Show resolved Hide resolved

Remove unused function

38fe9e8

ArlindKadra and others added 5 commits October 8, 2021 11:25

Adding unit test for only nall columns in the tabular feature categor…

b63ff3c

…ical evaluator

fix bug in remove all nan columns

d5bbdbe

Bug fix for making tests run by arlind

bfe4899

fix flake errors in feature validator

369edad

made typing code uniform

a4fb0cb

nabenabe0928 requested changes Oct 8, 2021

View reviewed changes

ravinkohli and others added 3 commits October 8, 2021 17:02

Apply suggestions from code review

44229a6

Co-authored-by: nabenabe0928 <[email protected]>

address comments from shuhei

ba3c1e7

address comments from shuhei (2)

10a8441

ravinkohli self-requested a review October 8, 2021 15:30

ravinkohli approved these changes Oct 8, 2021

View reviewed changes

ArlindKadra merged commit 6d9f99f into cocktail_fixes_time_debug Oct 8, 2021

ravinkohli deleted the tabular_validator_enhancement branch October 8, 2021 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement for the tabular validator. #291

Enhancement for the tabular validator. #291

ArlindKadra commented Oct 1, 2021

nabenabe0928 left a comment

nabenabe0928 left a comment

nabenabe0928 left a comment •

edited

Loading

ArlindKadra commented Oct 6, 2021

nabenabe0928 left a comment •

edited

Loading

ArlindKadra commented Oct 8, 2021

ArlindKadra commented Oct 8, 2021

ravinkohli commented Oct 8, 2021

nabenabe0928 left a comment

nabenabe0928 Oct 8, 2021

ravinkohli Oct 8, 2021

nabenabe0928 Oct 8, 2021

ravinkohli Oct 8, 2021

	SUPPORTED_TARGET_TYPES = Union[
	SupportedTargetTypes = Union[

Enhancement for the tabular validator. #291

Enhancement for the tabular validator. #291

Conversation

ArlindKadra commented Oct 1, 2021

nabenabe0928 left a comment

Choose a reason for hiding this comment

nabenabe0928 left a comment

Choose a reason for hiding this comment

nabenabe0928 left a comment • edited Loading

Choose a reason for hiding this comment

ArlindKadra commented Oct 6, 2021

nabenabe0928 left a comment • edited Loading

Choose a reason for hiding this comment

ArlindKadra commented Oct 8, 2021

ArlindKadra commented Oct 8, 2021

ravinkohli commented Oct 8, 2021

nabenabe0928 left a comment

Choose a reason for hiding this comment

nabenabe0928 Oct 8, 2021

Choose a reason for hiding this comment

ravinkohli Oct 8, 2021

Choose a reason for hiding this comment

nabenabe0928 Oct 8, 2021

Choose a reason for hiding this comment

ravinkohli Oct 8, 2021

Choose a reason for hiding this comment

nabenabe0928 left a comment •

edited

Loading

nabenabe0928 left a comment •

edited

Loading