-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hotfix] Fix the issue mentioned in #352 #354
[hotfix] Fix the issue mentioned in #352 #354
Conversation
…oml#334) * [feat] Support statistics print by adding results manager object * [refactor] Make SearchResults extract run_history at __init__ Since the search results should not be kept in eternally, I made this class to take run_history in __init__ so that we can implicitly call extraction inside. From this change, the call of extraction from outside is not recommended. However, you can still call it from outside and to prevent mixup of the environment, self.clear() will be called. * [fix] Separate those changes into PR#336 * [fix] Fix so that test_loss includes all the metrics * [enhance] Strengthen the test for sprint and SearchResults * [fix] Fix an issue in documentation * [enhance] Increase the coverage * [refactor] Separate the test for results_manager to organize the structure * [test] Add the test for get_incumbent_Result * [test] Remove the previous test_get_incumbent and see the coverage * [fix] [test] Fix reversion of metric and strengthen the test cases * [fix] Fix flake8 issues and increase coverage * [fix] Address Ravin's comments * [enhance] Increase the coverage * [fix] Fix a flake8 issu
* Create release workflow and CITATION.cff and update README, setup.py * fix bug in pypy token * fix documentation formatting * TODO for docker image * accept suggestions from shuhei * add further options for disable_file_output documentation * remove from release.yml
* [doc] Add workflow of the AutoPytorch * [doc] Address Ravin's comment
* set verbose=False in catboost * fix flake
* fix formatting in docs * Update examples/40_advanced/example_resampling_strategy.py
* [feat] Add an object that realizes the perf over time viz * [fix] Modify TODOs and add comments to avoid complications * [refactor] [feat] Format visualizer API and integrate this feature into BaseTask * [refactor] Separate a shared raise error process as a function * [refactor] Gather params in Dataclass to look smarter * [refactor] Merge extraction from history to the result manager Since this feature was added in a previous PR, we now rely on this feature to extract the history. To handle the order by the start time issue, I added the sort by endtime feature. * [feat] Merge the viz in the latest version * [fix] Fix nan --> worst val so that we can always handle by number * [fix] Fix mypy issues * [test] Add test for get_start_time * [test] Add test for order by end time * [test] Add tests for ensemble results * [test] Add tests for merging ensemble results and run history * [test] Add the tests in the case of ensemble_results is None * [fix] Alternate datetime to timestamp in tests to pass universally Since the mapping of timestamp to datetime variates on machine, the tests failed in the previous version. In this version, we changed the datetime in the tests to the fixed timestamp so that the tests will pass universally. * [fix] Fix status_msg --> status_type because it does not need to be str * [fix] Change the name for the homogeniety * [fix] Fix based on the file name change * [test] Add tests for set_plot_args * [test] Add tests for plot_perf_over_time in BaseTask * [refactor] Replace redundant lines by pytest parametrization * [test] Add tests for _get_perf_and_time * [fix] Remove viz attribute based on Ravin's comment * [fix] Fix doc-string based on Ravin's comments * [refactor] Hide color label settings extraction in dataclass Since this process makes the method in BaseTask redundant and this was pointed out by Ravin, I made this process a method of dataclass so that we can easily fetch this information. Note that since the color and label information always depend on the optimization results, we always need to pass metric results to ensure we only get related keys. * [test] Add tests for color label dicts extraction * [test] Add tests for checking if plt.show is called or not * [refactor] Address Ravin's comments and add TODO for the refactoring * [refactor] Change KeyError in EnsembleResults to empty Since it is not convenient to not be able to instantiate EnsembleResults in the case when we do not have any histories, I changed the functionality so that we can still instantiate even when the results are empty. In this case, we have empty arrays and it also matches the developers intuition. * [refactor] Prohibit external updates to make objects more robust * [fix] Remove a member variable _opt_scores since it is confusing Since opt_scores are taken from cost in run_history and metric_dict takes from additional_info, it was confusing for me where I should refer to what. By removing this, we can always refer to additional_info when fetching information and metrics are always available as a raw value. Although I changed a lot, the functionality did not change and it is easier to add any other functionalities now. * [example] Add an example how to plot performance over time * [fix] Fix unexpected train loss when using cross validation * [fix] Remove __main__ from example based on the Ravin's comment * [fix] Move results_xxx to utils from API * [enhance] Change example for the plot over time to save fig Since the plt.show() does not work on some environments, I changed the example so that everyone can run at least this example.
* cleanup of simple_imputer * Fixed doc and typo * Fixed docs * Made changes, added test * Fixed init statement * Fixed docs * Flake'd
Codecov Report
@@ Coverage Diff @@
## development #354 +/- ##
===============================================
+ Coverage 82.62% 82.64% +0.01%
===============================================
Files 154 154
Lines 9055 9076 +21
Branches 1592 1598 +6
===============================================
+ Hits 7482 7501 +19
- Misses 1106 1107 +1
- Partials 467 468 +1
Continue to review full report at Codecov.
|
Since we use BaseDataset without labels when we would like to instantiate it for the prediction, we remove this raise error.
00e2b96
to
e3aeb55
Compare
This change is the core of this PR, but for now since the original author does not response to the issue anymore, we close this PR. Note that the issue happened because the sklearn task inference for multiclass is incorrect, i.e. if the target is only int, it will mess up the target with multiclass even if it is really continuou. |
Types of changes
Checklist:
Description
This is the hotfix for the issue#352.
Since the inference of the task type in sklearn does not properly handle the continuous, I added a feature to filter continuous (i.e. regression task) from multiclass.
Motivation and Context
see issue#352..
How has this been tested?
It is not tested by proper pytest, but I checked by the following locally.
Note that
data.csv
is available in issue#352.Note that the original code yields the following error by the same code: