Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: discretize table #327

Merged
merged 41 commits into from
Jul 7, 2023
Merged

feat: discretize table #327

merged 41 commits into from
Jul 7, 2023

Conversation

robmeth
Copy link
Contributor

@robmeth robmeth commented May 26, 2023

Closes #143.

Summary of Changes

  • Added a class Discretizer in safeds.data.tabular.transformation that wraps the KBinsDiscretizer of scikit-learn
  • Made the class a subclass of TableTransformer
  • The __init__ for now only has a parameter number_of_bins to control how many bins are created
  • If number_of_bins is less than 2, it raises a ValueError

@robmeth robmeth linked an issue May 26, 2023 that may be closed by this pull request
@lars-reimann
Copy link
Member

lars-reimann commented May 26, 2023

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON black 3 0 0 0.75s
✅ PYTHON mypy 3 0 1.94s
✅ PYTHON ruff 3 0 0 0.09s
✅ REPOSITORY git_diff yes no 0.08s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security

Copy link
Contributor

@sibre28 sibre28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add the Discretizer to the src/safeds/data/tabular/containers/init.py file

@robmeth
Copy link
Contributor Author

robmeth commented Jun 9, 2023

Pandas __eq__ methode fails when comparing the dataframes after the transformation.

sibre28
sibre28 previously approved these changes Jun 10, 2023
@Marsmaennchen221 Marsmaennchen221 changed the title feat: 143 discretize table feat: discretize table Jun 18, 2023
# Conflicts:
#	src/safeds/data/tabular/transformation/__init__.py
@codecov
Copy link

codecov bot commented Jun 23, 2023

Codecov Report

Merging #327 (85827e4) into main (388ab2d) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #327   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           47        48    +1     
  Lines         2369      2428   +59     
=========================================
+ Hits          2369      2428   +59     
Impacted Files Coverage Δ
src/safeds/data/tabular/transformation/__init__.py 100.00% <100.00%> (ø)
...safeds/data/tabular/transformation/_discretizer.py 100.00% <100.00%> (ø)

@robmeth robmeth marked this pull request as ready for review June 23, 2023 07:43
@robmeth robmeth requested a review from a team as a code owner June 23, 2023 07:43
sibre28
sibre28 previously approved these changes Jun 23, 2023
Copy link
Contributor

@sibre28 sibre28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robmeth robmeth marked this pull request as draft June 23, 2023 14:22
@robmeth robmeth marked this pull request as ready for review June 30, 2023 11:04
@robmeth robmeth merged commit 5e3da8d into main Jul 7, 2023
@robmeth robmeth deleted the 143-discretize-table branch July 7, 2023 08:22
lars-reimann pushed a commit that referenced this pull request Jul 13, 2023
## [0.15.0](v0.14.0...v0.15.0) (2023-07-13)

### Features

* Add copy method for tables ([#405](#405)) ([72e87f0](72e87f0)), closes [#275](#275)
* add gaussian noise to image ([#430](#430)) ([925a505](925a505)), closes [#381](#381)
* add schema conversions when adding new rows to a table and schema conversion when creating a new table ([#432](#432)) ([6e9ff69](6e9ff69)), closes [#404](#404) [#322](#322) [#127](#127) [#322](#322) [#127](#127)
* add test for empty tables for the method `Table.sort_rows` ([#431](#431)) ([f94b768](f94b768)), closes [#402](#402)
* added color adjustment feature ([#409](#409)) ([2cbee36](2cbee36)), closes [#380](#380)
* added test_repr table tests ([#410](#410)) ([cb77790](cb77790)), closes [#349](#349)
* discretize table ([#327](#327)) ([5e3da8d](5e3da8d)), closes [#143](#143)
* Improve error handling of TaggedTable ([#450](#450)) ([c5da544](c5da544)), closes [#150](#150)
* Maintain tagging in methods inherited from `Table` class ([#332](#332)) ([bc73a6c](bc73a6c)), closes [#58](#58)
* new error class `OutOfBoundsError` ([#438](#438)) ([1f37e4a](1f37e4a)), closes [#262](#262)
* rename several `Table` methods for consistency ([#445](#445)) ([9954986](9954986)), closes [#439](#439)
* suggest similar columns if column gets accessed that doesnt exist ([#385](#385)) ([6a097a4](6a097a4)), closes [#203](#203)

### Bug Fixes

* added the missing ids in parameterized tests ([#412](#412)) ([dab6419](dab6419)), closes [#362](#362)
* don't warn if `Imputer` transforms column without missing values ([#448](#448)) ([f0cb6a5](f0cb6a5))
* Warnings raised by underlying seaborn and numpy libraries  ([#425](#425)) ([c4143af](c4143af)), closes [#357](#357)
@lars-reimann
Copy link
Member

🎉 This PR is included in version 0.15.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Included in a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discretize Table
5 participants