Bug fixes (#249)

* Update implementation * Coding style fixes * Implementation update * Style fix * Turn weighted loss into a constant again, implementation update * Cocktail branch inconsistencies (#275) * To nemo * Revert change in T_curr as results conclusively prove it should be 0 * Revert cutmix change after data from run * Final conclusion after results * FIX bug in shake alpha beta * Updated if is_training condition for shake drop * Remove temp fix in row cutmic * Cocktail fixes time debug (#286) * preprocess inside data validator * add time debug statements * Add fixes for categorical data * add fit_ensemble * add arlind fix for swa and se * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * shuffle = False * Shake Shake updates (#287) * To test locally * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * To test locally * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * To test locally * shuffle = False * To test locally * updates to search space * updates to search space * update branch with search space * undo search space update * fix bug in shake shake flag * limit to shake-even * restrict to even even * Add even even and others for shake-drop also * fix bug in passing alpha beta method * restrict to only even even * fix silly bug: * remove imputer and ordinal encoder for categorical transformer in feature validator * Address comments from shuhei * fix issues with ensemble fitting post hoc * Address comments on the PR * Fix flake and mypy errors * Address comments from PR #286 * fix bug in embedding * Update autoPyTorch/api/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py Co-authored-by: nabenabe0928 <[email protected]> * Address comments from shuhei * adress comments from shuhei * fix flake and mypy * Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/tabular_classification.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * increase threads_per_worker * fix bug in rowcutmix * Enhancement for the tabular validator. (#291) * Initial try at an enhancement for the tabular validator * Adding a few type annotations * Fixing bugs in implementation * Adding wrongly deleted code part during rebase * Fix bug in _get_args * Fix bug in _get_args * Addressing Shuhei's comments * Address Shuhei's comments * Refactoring code * Refactoring code * Typos fix and additional comments * Replace nan in categoricals with simple imputer * Remove unused function * add comment * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <[email protected]> * Adding unit test for only nall columns in the tabular feature categorical evaluator * fix bug in remove all nan columns * Bug fix for making tests run by arlind * fix flake errors in feature validator * made typing code uniform * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * address comments from shuhei * address comments from shuhei (2) Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * resolve code issues with new versions * Address comments from shuhei * make run_traditional_ml function * implement suggestion from shuhei and fix bug in rowcutmixtrainer * fix return type docstring * add better documentation and fix bug in shake_drop_get_bl * Apply suggestions from code review Co-authored-by: nabenabe0928 <[email protected]> * add test for comparator and other improvements based on PR comments * fix bug in test * [fix] Fix the condition in the raising error of all_nan_columns * [refactor] Unite name conventions of numpy array and pandas dataframe * [doc] Add the description about the tabular feature transformation * [doc] Add the description of the tabular feature transformation * address comments from arlind * address comments from arlind * change to as_tensor and address comments from arlind * correct description for functions in data module Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Arlind Kadra <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> * Addressing Shuhei's comments * flake8 problems fix * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Add indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Add line indentation. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/data/tabular_feature_validator.py Validate if there is a column transformer since for sparse matrices we will not have one. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/utils/implementations.py Delete uncommented line. Co-authored-by: Ravin Kohli <[email protected]> * Allow the number of threads to be given by the user * Removing unnecessary argument and refactoring the attribute. * Addressing Ravin's comments * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Updating the function documentation according to the agreed style. Co-authored-by: Ravin Kohli <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Providing information on the wrong method provided for shake-shake regularization. Co-authored-by: nabenabe0928 <[email protected]> * add todo for backend and accept changes from shuhei * Addressing Shuhei's and Ravin's comments * Addressing Shuhei's and Ravin's comments, bug fix * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving code readibility. Co-authored-by: nabenabe0928 <[email protected]> * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving consistency. Co-authored-by: nabenabe0928 <[email protected]> * bug fix Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: nabenabe0928 <[email protected]> Co-authored-by: Ravin Kohli <[email protected]>
automl · Feb 28, 2022 · e5f3ac4 · e5f3ac4
1 parent a9ed2ae
commit e5f3ac4
Show file tree

Hide file tree

Showing 35 changed files with 1,787 additions and 400 deletions.
diff --git a/autoPyTorch/api/base_task.py b/autoPyTorch/api/base_task.py
diff --git a/autoPyTorch/api/tabular_classification.py b/autoPyTorch/api/tabular_classification.py
@@ -421,6 +421,8 @@ def search(
             dataset_name=dataset_name,
             dataset_compression=self._dataset_compression)
 
+        if self.dataset is None:
+            raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
         return self._search(
             dataset=self.dataset,
             optimize_metric=optimize_metric,

diff --git a/autoPyTorch/api/tabular_regression.py b/autoPyTorch/api/tabular_regression.py
@@ -78,7 +78,6 @@ class TabularRegressionTask(BaseTask):
             Search space updates that can be used to modify the search
             space of particular components or choice modules of the pipeline
     """
-
     def __init__(
         self,
         seed: int = 1,
@@ -423,6 +422,8 @@ def search(
             dataset_name=dataset_name,
             dataset_compression=self._dataset_compression)
 
+        if self.dataset is None:
+            raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
         return self._search(
             dataset=self.dataset,
             optimize_metric=optimize_metric,

diff --git a/autoPyTorch/data/base_feature_validator.py b/autoPyTorch/data/base_feature_validator.py
@@ -1,5 +1,5 @@
 import logging
-from typing import List, Optional, Union
+from typing import List, Optional, Set, Tuple, Union
 
 import numpy as np
 
@@ -23,24 +23,21 @@ class BaseFeatureValidator(BaseEstimator):
             List of the column types found by this estimator during fit.
         data_type (str):
             Class name of the data type provided during fit.
-        column_transformer (Optional[BaseEstimator])
+        encoder (Optional[BaseEstimator])
             Host a encoder object if the data requires transformation (for example,
-            if provided a categorical column in a pandas DataFrame)
-        transformed_columns (List[str])
-            List of columns that were encoded.
+            if provided a categorical column in a pandas DataFrame).
     """
     def __init__(
         self,
         logger: Optional[Union[PicklableClientLogger, logging.Logger]] = None,
-    ):
+    ) -> None:
         # Register types to detect unsupported data format changes
         self.feat_type: Optional[List[str]] = None
         self.data_type: Optional[type] = None
         self.dtypes: List[str] = []
         self.column_order: List[str] = []
 
         self.column_transformer: Optional[BaseEstimator] = None
-        self.transformed_columns: List[str] = []
 
         self.logger: Union[
             PicklableClientLogger, logging.Logger
@@ -52,6 +49,8 @@ def __init__(
         self.categorical_columns: List[int] = []
         self.numerical_columns: List[int] = []
 
+        self.all_nan_columns: Optional[Set[Union[int, str]]] = None
+
         self._is_fitted = False
 
     def fit(
@@ -74,7 +73,7 @@ def fit(
 
         # If a list was provided, it will be converted to pandas
         if isinstance(X_train, list):
-            X_train, X_test = self.list_to_dataframe(X_train, X_test)
+            X_train, X_test = self.list_to_pandas(X_train, X_test)
 
         self._check_data(X_train)
 
@@ -108,6 +107,7 @@ def _fit(
             self:
                 The fitted base estimator
         """
+
         raise NotImplementedError()
 
     def _check_data(
@@ -117,11 +117,12 @@ def _check_data(
         """
         Feature dimensionality and data type checks
 
-        Arguments:
+        Args:
             X (SUPPORTED_FEAT_TYPES):
                 A set of features that are going to be validated (type and dimensionality
                 checks) and a encoder fitted in the case the data needs encoding
         """
+
         raise NotImplementedError()
 
     def transform(
@@ -138,4 +139,30 @@ def transform(
             np.ndarray:
                 The transformed array
         """
+
+        raise NotImplementedError()
+
+    def list_to_pandas(
+        self,
+        X_train: SUPPORTED_FEAT_TYPES,
+        X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
+    ) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
+        """
+        Converts a list to a pandas DataFrame. In this process, column types are inferred.
+
+        If test data is provided, we proactively match it to train data
+
+        Args:
+            X_train (SUPPORTED_FEAT_TYPES):
+                A set of features that are going to be validated (type and dimensionality
+                checks) and a encoder fitted in the case the data needs encoding
+            X_test (Optional[SUPPORTED_FEAT_TYPES]):
+                A hold out set of data used for checking
+        Returns:
+            pd.DataFrame:
+                transformed train data from list to pandas DataFrame
+            pd.DataFrame:
+                transformed test data from list to pandas DataFrame
+        """
+
         raise NotImplementedError()
diff --git a/autoPyTorch/data/base_target_validator.py b/autoPyTorch/data/base_target_validator.py
@@ -35,7 +35,7 @@ def __init__(self,
                                         logging.Logger
                                         ]
                                   ] = None,
-                 ):
+                 ) -> None:
         self.is_classification = is_classification
 
         self.data_type: Optional[type] = None
@@ -85,6 +85,7 @@ def fit(
                                      np.shape(y_test)
                                  ))
             if isinstance(y_train, pd.DataFrame):
+                y_train = cast(pd.DataFrame, y_train)
                 y_test = cast(pd.DataFrame, y_test)
                 if y_train.columns.tolist() != y_test.columns.tolist():
                     raise ValueError(
@@ -130,7 +131,7 @@ def _fit(
 
     def transform(
         self,
-        y: Union[SupportedTargetTypes],
+        y: SupportedTargetTypes,
     ) -> np.ndarray:
         """
         Args: