Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add matlab dataset into kedro-datasets #435

Closed
wants to merge 105 commits into from
Closed

feat: add matlab dataset into kedro-datasets #435

wants to merge 105 commits into from

Conversation

samuel-lee-sj
Copy link
Contributor

@samuel-lee-sj samuel-lee-sj commented Nov 17, 2023

Description

This is implementation of a .mat dataset from MatLab for kedro

Development notes

Added two files, matlab_dataset.py, test_matlab_dataset.py. Tested on pytest.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@samuel-lee-sj samuel-lee-sj changed the title Request to add matlab dataset into kedro-datasets feat: add matlab dataset into kedro-datasets Nov 17, 2023
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your pull request @samuel-lee-sj ! We'll need a couple of iterations for this to be ready, but this is a great start 💪🏽

You can also do make lint plugin=kedro-datasets (from the kedro-plugins directory) to automatically lint the code

kedro-datasets/kedro_datasets/matlab/matlab_dataset.py Outdated Show resolved Hide resolved
kedro-datasets/kedro_datasets/matlab/matlab_dataset.py Outdated Show resolved Hide resolved
kedro-datasets/kedro_datasets/matlab/matlab_dataset.py Outdated Show resolved Hide resolved
@samuel-lee-sj
Copy link
Contributor Author

When I try
'make lint plugin=kedro-datasets'

I get an error saying 'No hook with id 'ruff-' in stage 'manual'.
I looked in the .pre-commit-config.yaml and found it. So I am unsure why the error surfaced.

Below is the full error message:

(kedro) PS G:\Working\kedro-plugins\kedro-plugins> make lint plugins=kedro-datasets
pre-commit run -a --hook-stage manual ruff- && pre-commit run trailing-whitespace --all-files && pre-commit run end-of-file-fixer --all-files && pre-commit run check-yaml --all-files && pre-commit run check-added-large-files --all-files && pre-commit run check-case-conflict --all-files && pre-commit run check-merge-conflict --all-files && pre-commit run debug-statements --all-files && pre-commit run black- --all-files --hook-stage manual && pre-commit run secret_scan --all-files --hook-stage manual && pre-commit run bandit --all-files --hook-stage manual
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/adamchainz/blacken-docs.
[INFO] Initializing environment for https://github.com/adamchainz/blacken-docs:black==22.12.0.
No hook with id ruff- in stage manual
make: *** [Makefile:17: lint] Error 1

@astrojuanlu
Copy link
Member

@samuel-lee-sj it's because of a typo: you wrote plugins= instead of plugin=, so the $(plugin) variable was empty and ruff- was attempted unsuccessfully:

kedro-plugins/Makefile

Lines 16 to 17 in f59e930

lint:
pre-commit run -a --hook-stage manual ruff-$(plugin) && pre-commit run trailing-whitespace --all-files && pre-commit run end-of-file-fixer --all-files && pre-commit run check-yaml --all-files && pre-commit run check-added-large-files --all-files && pre-commit run check-case-conflict --all-files && pre-commit run check-merge-conflict --all-files && pre-commit run debug-statements --all-files && pre-commit run black-$(plugin) --all-files --hook-stage manual && pre-commit run secret_scan --all-files --hook-stage manual && pre-commit run bandit --all-files --hook-stage manual

@AhdraMeraliQB AhdraMeraliQB added the Community Issue/PR opened by the open-source community label Dec 7, 2023
@merelcht
Copy link
Member

merelcht commented Jan 3, 2024

Hi @samuel-lee-sj , would you like to continue this PR or do you need some help finishing it?

@samuel-lee-sj
Copy link
Contributor Author

samuel-lee-sj commented Jan 4, 2024

Hello @merelcht. I was under the impression that @ankatiyar was going to takeover. Let me know what you think!

@ankatiyar
Copy link
Contributor

@samuel-lee-sj Could you fix the sign-off on the commits by following the instructions here and I can help with the rest! 😄

@samuel-lee-sj
Copy link
Contributor Author

Hello @ankatiyar. I think I did it. Let me know if there are any more problems.

@ankatiyar
Copy link
Contributor

@samuel-lee-sj, seems like the DCO test is still not passing. Would you mind trying rebasing again?

PtrBld and others added 17 commits January 12, 2024 10:49
…` from main repository to kedro-datasets (#253)

Signed-off-by: Peter Bludau <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: Simon Brugman <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
fix RTD

Signed-off-by: Nok <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* Pin pip version temporarily

Signed-off-by: Ankita Katiyar <[email protected]>

* Hive support failures

Signed-off-by: Ankita Katiyar <[email protected]>

* Also pin pip on lint

Signed-off-by: Ankita Katiyar <[email protected]>

* Temporary ignore databricks spark tests

Signed-off-by: Ankita Katiyar <[email protected]>

---------

Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* perf(datasets): delay `Engine` creation until need

Signed-off-by: Deepyaman Datta <[email protected]>

* chore: don't check coverage in TYPE_CHECKING block

Signed-off-by: Deepyaman Datta <[email protected]>

* perf(datasets): don't connect in `__init__` method

Signed-off-by: Deepyaman Datta <[email protected]>

* test(datasets): fix tests to touch `create_engine`

Signed-off-by: Deepyaman Datta <[email protected]>

* perf(datasets): don't connect in `__init__` method

Signed-off-by: Deepyaman Datta <[email protected]>

* style(datasets): exec Ruff on sql_dataset.py 🐶

Signed-off-by: Deepyaman Datta <[email protected]>

* Undo changes to `engines` values type (for Sphinx)

Signed-off-by: Deepyaman Datta <[email protected]>

* Patch Sphinx build by removing `Engine` references

* perf(datasets): don't connect in `__init__` method

Signed-off-by: Deepyaman Datta <[email protected]>

* chore(datasets): don't require coverage for import

* chore(datasets): del unused `TYPE_CHECKING` import

* docs(datasets): document lazy connection in README

* perf(datasets): remove create in `SQLQueryDataset`

Signed-off-by: Deepyaman Datta <[email protected]>

* refactor(datasets): do not return the created conn

Signed-off-by: Deepyaman Datta <[email protected]>

---------

Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* Remove references to Python 3.7

Signed-off-by: lrcouto <[email protected]>

* Revert kedro-dataset changes

Signed-off-by: lrcouto <[email protected]>

* Revert kedro-dataset changes

Signed-off-by: lrcouto <[email protected]>

* Add information to release docs

Signed-off-by: lrcouto <[email protected]>

---------

Signed-off-by: lrcouto <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* feat(datasets) add PolarsDataset to support Polars's Lazy API

Signed-off-by: Matthias Roels <[email protected]>

* Fix(datasets): rename PolarsDataSet to PolarsDataSet

Add PolarsDataSet as an alias for PolarsDataset with
deprecation warning.

Signed-off-by: Matthias Roels <[email protected]>

* Fix(datasets): apply ruff linting rules

Signed-off-by: Matthias Roels <[email protected]>

* Fix(datasets): Correct pattern matching when Raising exceptions

Corrected PolarsDataSet to PolarsDataset in the pattern to match
in test_load_missing_file

Signed-off-by: Matthias Roels <[email protected]>

* fix(datasets): clean up PolarsDataset related code

Remove reference to PolarsDataSet as this is not required for new
dataset implementations.

Signed-off-by: Matthias Roels <[email protected]>

* feat(datasets): Rename Polars Datasets to better describe their intent

Signed-off-by: Matthias Roels <[email protected]>

* feat(datasets): clean up LazyPolarsDataset

Signed-off-by: Matthias Roels <[email protected]>

* fix(datasets): increase test coverage for PolarsDataset classes

Signed-off-by: Matthias Roels <[email protected]>

* docs(datasets): add renamed Polars datasets to docs

Signed-off-by: Matthias Roels <[email protected]>

* docs(datasets): Add new polars datasets to release notes

Signed-off-by: Matthias Roels <[email protected]>

* fix(datasets): load_args not properly passed to LazyPolarsDataset.load

Signed-off-by: Matthias Roels <[email protected]>

* docs(datasets): fix spelling error in release notes

Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: Matthias Roels <[email protected]>

---------

Signed-off-by: Matthias Roels <[email protected]>
Signed-off-by: Matthias Roels <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
Co-authored-by: Matthias Roels <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* bump version

Signed-off-by: Ankita Katiyar <[email protected]>

* Update release notes

Signed-off-by: Ankita Katiyar <[email protected]>

---------

Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Bump version

Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Bump version

Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Fix missing jQuery

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
#413)

* Fix Lazy Polars dataset to use the new-style base class

Fix gh-412

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Update release notes

Signed-off-by: Ankita Katiyar <[email protected]>

* Revert "Update release notes"

This reverts commit 92ceea6.

---------

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Co-authored-by: Sajid Alam <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* chore(datasets):  lazily load `partitions` classes

Signed-off-by: Deepyaman Datta <[email protected]>

* test(datasets): run doctests to check examples run

Signed-off-by: Deepyaman Datta <[email protected]>

* test(datasets): keep running tests amidst failures

Signed-off-by: Deepyaman Datta <[email protected]>

* docs(datasets): format ManagedTableDataset example

Signed-off-by: Deepyaman Datta <[email protected]>

* chore(datasets): ignore breaking mods for doctests

Signed-off-by: Deepyaman Datta <[email protected]>

* style(airflow): black code in Kedro-Airflow README

Signed-off-by: Deepyaman Datta <[email protected]>

* docs(datasets): fix example syntax, and autoformat

Signed-off-by: Deepyaman Datta <[email protected]>

* docs(datasets): remove `kedro.extras.datasets` ref

Signed-off-by: Deepyaman Datta <[email protected]>

* docs(datasets): remove `>>> ` prefix for YAML code

Signed-off-by: Deepyaman Datta <[email protected]>

* docs(datasets): remove `kedro.extras.datasets` ref

Signed-off-by: Deepyaman Datta <[email protected]>

* docs(datasets): replace `data_set`s with `dataset`s

Signed-off-by: Deepyaman Datta <[email protected]>

* chore(datasets): undo changes for running doctests

Signed-off-by: Deepyaman Datta <[email protected]>

* revert(datasets):  undo lazily load `partitions` classes

Refs: 3fdc5a8
Signed-off-by: Deepyaman Datta <[email protected]>

* revert(airflow): undo black code in Kedro-Airflow README

Refs: dc3476e

Signed-off-by: Deepyaman Datta <[email protected]>

---------

Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
samuel-lee-sj and others added 25 commits January 12, 2024 10:52
Signed-off-by: samuelleeshemen <[email protected]>
* Add python version support policy to plugin readmes

Signed-off-by: Merel Theisen <[email protected]>

* Temporarily pin connexion

Signed-off-by: Merel Theisen <[email protected]>

---------

Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* Add shared CSS and meganav

Signed-off-by: Jo Stichbury <[email protected]>

* Add end of file

Signed-off-by: Jo Stichbury <[email protected]>

* Add new heap data source

Signed-off-by: Jo Stichbury <[email protected]>

* adjust heap parameter

Signed-off-by: Jo Stichbury <[email protected]>

* Remove nav_version next to Kedro logo in top left; add Kedro logo

* Revise project name and author name

Signed-off-by: Jo Stichbury <[email protected]>

* Use full kedro icon and type for logo

* Add close btn to mobile nav

Signed-off-by: vladimir-mck <[email protected]>

* Add css for mobile nav logo image

Signed-off-by: vladimir-mck <[email protected]>

* Update close button for mobile nav

Signed-off-by: vladimir-mck <[email protected]>

* Add open button to mobile nav

Signed-off-by: vladimir-mck <[email protected]>

* Delete kedro-datasets/docs/source/kedro-horizontal-color-on-light.svg

Signed-off-by: vladimir-mck <[email protected]>

* Update conf.py

Signed-off-by: vladimir-mck <[email protected]>

* Update layout.html

Add links to subprojects

Signed-off-by: Jo Stichbury <[email protected]>

* Remove svg from docs -- not needed??

Signed-off-by: Jo Stichbury <[email protected]>

* linter error fix

Signed-off-by: Jo Stichbury <[email protected]>

---------

Signed-off-by: Jo Stichbury <[email protected]>
Signed-off-by: vladimir-mck <[email protected]>
Co-authored-by: Tynan DeBold <[email protected]>
Co-authored-by: vladimir-mck <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* Add HuggingFace datasets

Co-authored-by: Danny Farah <[email protected]>
Co-authored-by: Kevin Koga <[email protected]>
Co-authored-by: Mate Scharnitzky <[email protected]>
Co-authored-by: Tomer Shor <[email protected]>
Co-authored-by: Pierre-Yves Mousset <[email protected]>
Co-authored-by: Bela Chupal <[email protected]>
Co-authored-by: Khangjrakpam Arjun <[email protected]>
Co-authored-by: Juan Luis Cano Rodríguez <[email protected]>
Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Apply suggestions from code review

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

Co-authored-by: Joel <[email protected]>
Co-authored-by: Nok Lam Chan <[email protected]>

* Typo

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Fix docstring

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Add docstring for HFTransformerPipelineDataset

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Use intersphinx for cross references in Hugging Face docstrings

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Add docstring for HFDataset

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Add missing test dependencies

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Add tests for huggingface datasets

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Fix HFDataset.save

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Add test for HFDataset.list_datasets

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Use new name

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Consolidate imports

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

---------

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>
Co-authored-by: Danny Farah <[email protected]>
Co-authored-by: Kevin Koga <[email protected]>
Co-authored-by: Mate Scharnitzky <[email protected]>
Co-authored-by: Tomer Shor <[email protected]>
Co-authored-by: Pierre-Yves Mousset <[email protected]>
Co-authored-by: Bela Chupal <[email protected]>
Co-authored-by: Khangjrakpam Arjun <[email protected]>
Co-authored-by: Joel <[email protected]>
Co-authored-by: Nok Lam Chan <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* test(datasets): fix `dask.ParquetDataset` doctests

Signed-off-by: Deepyaman Datta <[email protected]>

* test(datasets): use `tmp_path` fixture in doctests

Signed-off-by: Deepyaman Datta <[email protected]>

* test(datasets): simplify by not passing the schema

Signed-off-by: Deepyaman Datta <[email protected]>

* test(datasets): ignore conftest for doctests cover

Signed-off-by: Deepyaman Datta <[email protected]>

* Create MANIFEST.in

Signed-off-by: Deepyaman Datta <[email protected]>

---------

Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* test(datasets): add outputs to matplotlib doctests

Signed-off-by: Deepyaman Datta <[email protected]>

* Update Makefile

Signed-off-by: Deepyaman Datta <[email protected]>

* Reformat code example, line length is short enough

* Update kedro-datasets/kedro_datasets/matplotlib/matplotlib_writer.py

Signed-off-by: Deepyaman Datta <[email protected]>

---------

Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* Add add-on data to heap event

Signed-off-by: lrcouto <[email protected]>

* Move addons logic to _get_project_property

Signed-off-by: Ankita Katiyar <[email protected]>

* Add condition for pyproject.toml

Signed-off-by: Ankita Katiyar <[email protected]>

* Fix tests

Signed-off-by: Ankita Katiyar <[email protected]>

* Fix tests

Signed-off-by: Ankita Katiyar <[email protected]>

* add tools to mock

Signed-off-by: lrcouto <[email protected]>

* lint

Signed-off-by: lrcouto <[email protected]>

* Update tools test

Signed-off-by: Ankita Katiyar <[email protected]>

* Add after_context_created tools test

Signed-off-by: lrcouto <[email protected]>

* Update rename to tools

Signed-off-by: Ankita Katiyar <[email protected]>

* Update kedro-telemetry/tests/test_plugin.py

Co-authored-by: Sajid Alam <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

---------

Signed-off-by: lrcouto <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Co-authored-by: Ankita Katiyar <[email protected]>
Co-authored-by: Ankita Katiyar <[email protected]>
Co-authored-by: Sajid Alam <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
* update pandas-gbq dependency declaration

Signed-off-by: Onur Kuru <[email protected]>

* fix fmt

Signed-off-by: Onur Kuru <[email protected]>

---------

Signed-off-by: Onur Kuru <[email protected]>
Co-authored-by: Ahdra Merali <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Update scikit-learn version

Signed-off-by: Nok Lam Chan <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Fix broken links in README

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Co-authored-by: Juan Luis Cano Rodríguez <[email protected]>
Signed-off-by: samuelleeshemen <[email protected]>
Signed-off-by: samuel-lee-sj <[email protected]>
@samuel-lee-sj
Copy link
Contributor Author

samuel-lee-sj commented Jan 12, 2024

Hello @ankatiyar. I think I messed up the rebase. Could you give me some help?

The DCO is saying that:
Commit sha: bc76e83, Author: Merel Theisen, Committer: samuelleeshemen; Can not find "Merel Theisen [email protected]", in ["Merel Theisen [email protected]", "Merel Theisen [email protected]", "Merel Theisen [email protected]"].

I don't think I edited this commit. If I did I am sorry.

@samuel-lee-sj samuel-lee-sj closed this by deleting the head repository Jan 15, 2024
@merelcht merelcht removed the Community Issue/PR opened by the open-source community label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.