Reproducible benchmark experiment scripts and results for Ludwig
Utilities for experiments are in the utils directory.
- best_hyperopt_statistics.py fetches from the specified hyperopt_statistics.json file the combined loss and specified metric for the best model found by the hyperparameter search.
Scripts and results for automl experiments are in the automl directory.
- The heuristics subdirectory contains subdirectories for each dataset used to run extensive hyperparameter searches from which to derive automl heuristics.
- The validation subdirectory contains subdirectories for each dataset used to validate the derived heuristics.
Each dataset subdirectory contains the following scripts and configuration files, as appropriate:
-
Training
- Simple train validation of concat model type:
- Script w/Configuration: train_concat_sanity_laptop.py, config_concat_sanity_laptop.yaml
- Simple train validation of tabnet model type:
- Script w/Configuration: train_tabnet_sanity_laptop.py, config_tabnet_sanity_laptop.yaml
- Simple train validation of transformer model type:
- Script w/Configuration: train_transf_sanity_laptop.py, config_transf_sanity_laptop.yaml
- Train validation of best tabnet model configuration found in heuristics search runs
- Script w/Configuration: train_tabnet_reference_laptop.py, config_tabnet_reference_laptop.yaml
- Train validation of best tabnet model configuration found in heuristics search runs using updated automatic feature type selection (if impacted)
- Script w/Configuration: train_tabnet_reference_auto.py, config_tabnet_reference_auto.yaml
- Simple train validation of concat model type:
-
AutoML
- Automatically generate configuration for hyperparameter search via create_auto_config API
- Script: get_auto_train_config.py
- Output for original feature type selection code: auto_config.json.orig
- Output for updated feature type selection code: auto_config.json.update
- Output for updated feature type selection code + automl code w/heuristics: auto_config.json.automl
- Automatically generate and run configuration for hyperparameter search via auto_train API w/1hr time limit
- Script: run_auto_train_1hr.py
- Output: hyperopt_statistics.json.1hr
- Automatically generate and run configuration for hyperparameter search via auto_train API w/2hr time limit
- Script: run_auto_train_2hr.py
- Output: hyperopt_statistics.json.2hr
- Automatically generate and run configuration for hyperparameter search via auto_train API w/4hr time limit
- Script: run_auto_train_4hr.py
- Output: hyperopt_statistics.json.4hr
- Automatically generate configuration for hyperparameter search via create_auto_config API