Simplify `TaggedTable` #647

lars-reimann · 2024-04-24T10:17:09Z

Is your feature request related to a problem?

We've already spent a lot of time overriding Table methods in TaggedTable just to keep the tagging. There are also more issues waiting, e.g. #644. Moreover, we also want to connect other types of input and output data, e.g. images to a table, so TaggedTable is just a special case of a more abstract Dataset concept.

Desired solution

Don't inherit Table in TaggedTable.
Get rid of all the overridden Table methods.
Table.tagColumn can stay. It's now the final step before passing the data to an ML model.
Regressor.fit and Classifier.fit could still return a TaggedTable or maybe just a Table?

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

The text was updated successfully, but these errors were encountered:

lars-reimann · 2024-04-25T08:45:52Z

There's also a strong correlation between TaggedTable and the InputConversionTable. Both expect names of features and the name of the target.

Gerhardsa0 · 2024-04-26T10:09:35Z

There's also a strong correlation between TaggedTable and the InputConversionTable. Both expect names of features and the name of the target.

I aggree, also with TimeSeries and test:inout_conversion_time_series. Maybe we should let the NN detect the type and then cast, because a lot of information is stored there.
I will make a prototype in my PR #615

Closes #647 ### Summary of Changes * `TaggedTable` is now called `TabularDataset`, * It is moved from `safeds.data.tabular.containers` to `safeds.data.labeled.containers`. That's where all dataset classes for supervised learning will go, like the upcoming `ImageDataset`. * `TabularDataset` no longer inherits from `Table`. * `TabularDataset` now has a very small interface. It's only meant to be used as input for supervised ML models. Table manipulation is now solely done using the `Table` class. * `tag_columns` on `Table` is now called `to_tabular_dataset`. This makes it consistent with other conversion methods and emphasizes that this is a terminal operation and should only be used once one is done manipulating the table. * `TabularDataset` now has a public `to_table` method to get a `Table` again. --------- Co-authored-by: megalinter-bot <[email protected]>

## [0.22.0](v0.21.0...v0.22.0) (2024-05-01) ### Features * `is_fitted` is now always a property ([#662](#662)) ([b1db881](b1db881)), closes [#586](#586) * add `Column.missing_value_count` ([#682](#682)) ([f084916](f084916)), closes [#642](#642) * Add `InputConversion` & `OutputConversion` for nn interface ([#625](#625)) ([fd723f7](fd723f7)), closes [#621](#621) * Add hash,eq and sizeof in ForwardLayer ([#634](#634)) ([72f7fde](72f7fde)), closes [#633](#633) * allow using tables that already contain target for prediction ([#687](#687)) ([e9f1cfb](e9f1cfb)), closes [#636](#636) * callback `Row.sort_columns` takes four parameters instead of two tuples ([#683](#683)) ([9c3e3de](9c3e3de)), closes [#584](#584) * rename `group_rows_by` in `Table` to `group_rows` ([#661](#661)) ([c1644b7](c1644b7)), closes [#611](#611) * rename `number_of_column` in `Row` to `number_of_columns` ([#660](#660)) ([0a08296](0a08296)), closes [#646](#646) * rework `TaggedTable` ([#680](#680)) ([db2b613](db2b613)), closes [#647](#647) * show missing value count/ratio in summarized statistics ([#684](#684)) ([74b8a35](74b8a35)), closes [#619](#619) * specify `extras` instead of `features` in `to_tabular_dataset` ([#685](#685)) ([841657f](841657f)), closes [#623](#623) ### Bug Fixes * actually use `kernel` of support vector machines for training ([#681](#681)) ([09c5082](09c5082)), closes [#602](#602) ### Performance Improvements * Faster plot_histograms and more reliable plots ([#659](#659)) ([b5f0a12](b5f0a12))

lars-reimann · 2024-05-01T19:44:22Z

🎉 This issue has been resolved in version 0.22.0 🎉

The release is available on:

v0.22.0
GitHub release

Your semantic-release bot 📦🚀

lars-reimann added the enhancement 💡 New feature or request label Apr 24, 2024

lars-reimann added this to Library Apr 24, 2024

github-project-automation bot moved this to Backlog in Library Apr 24, 2024

This was referenced Apr 24, 2024

Override split_rows in TaggedTable to return two TaggedTables #644

Closed

Remove Table.time_columns #648

Closed

lars-reimann self-assigned this May 1, 2024

lars-reimann mentioned this issue May 1, 2024

feat: rework TaggedTable #680

Merged

lars-reimann linked a pull request May 1, 2024 that will close this issue

feat: rework TaggedTable #680

Merged

lars-reimann closed this as completed in #680 May 1, 2024

github-project-automation bot moved this from Backlog to ✔️ Done in Library May 1, 2024

lars-reimann added the released Included in a release label May 1, 2024

lars-reimann mentioned this issue May 5, 2024

For TableTransformers, add a method to return a mapping of newly-created columns to old columns and vice-versa #387

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify `TaggedTable` #647

Simplify `TaggedTable` #647

lars-reimann commented Apr 24, 2024

lars-reimann commented Apr 25, 2024

Gerhardsa0 commented Apr 26, 2024

lars-reimann commented May 1, 2024

Simplify TaggedTable #647

Simplify TaggedTable #647

Comments

lars-reimann commented Apr 24, 2024

Is your feature request related to a problem?

Desired solution

Possible alternatives (optional)

Screenshots (optional)

Additional Context (optional)

lars-reimann commented Apr 25, 2024

Gerhardsa0 commented Apr 26, 2024

lars-reimann commented May 1, 2024

Simplify `TaggedTable` #647

Simplify `TaggedTable` #647