Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): Add kwargs for huggingface.HFDataset #580

Merged

Conversation

eromerobilbomatica
Copy link
Contributor

Description

Issue: The kedro_datasets.huggingface.HFDataset class have to accept parameters optionals in call of the function #574

Development notes

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@eromerobilbomatica eromerobilbomatica changed the title Chance to #574 issue fix: Chance to #574 issue Feb 28, 2024
@datajoely
Copy link
Contributor

thanks for raising this @eromerobilbomatica !

@noklam noklam changed the title fix: Chance to #574 issue feat(datasets): Add kwargs for huggingface.HFDataset Feb 29, 2024
@noklam
Copy link
Contributor

noklam commented Feb 29, 2024

@eromerobilbomatica I see a lot of newlines get added in files, is it possible to not include these changes in the PR? Ideally this PR should touch only kedor-datasets but not other plugins.

p.s. I also updated the title

@noklam noklam self-requested a review February 29, 2024 12:26
@eromerobilbomatica
Copy link
Contributor Author

That is @noklam, my PR only is connected kedor-datasets. The rest of the chances have been created when I run make plugin=kedro-datasets lint. Thanks for fixing the title.

@astrojuanlu
Copy link
Member

@eromerobilbomatica There is a legitimate problem, see the failed checks:

kedro.io.core.DatasetError: Failed while loading data from data set HFDataset(dataset_name=openai_humaneval, dataset_tags=['task_categories:text2text-generation', 'annotations_creators:expert-generated', 'language_creators:expert-generated', 'multilinguality:monolingual', 'size_categories:n<1K', 'source_datasets:original', 'language:en', 'license:mit', 'code-generation', 'croissant', 'arxiv:2107.03374', 'region:us']).
'HFDataset' object has no attribute '_pipeline_kwargs'

…y accessing self._dataset_kwargs instead of self._pipeline_kwargs.
@eromerobilbomatica
Copy link
Contributor Author

@astrojuanlu just fixed it, sorry! things of copy and paste.

@astrojuanlu

This comment was marked as outdated.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @eromerobilbomatica ! ⭐ Can you add the change to the release notes together with your name as contributor (you can see examples in old release notes).

I add myself as as contributor.

Signed-off-by: Eduardo <[email protected]>
Add note about the fixed bug.

Signed-off-by: Eduardo <[email protected]>
Fixed my comments and its position in the section.

Signed-off-by: Eduardo <[email protected]>
Changed my name to community contributions in the "Upcoming Release" section

Signed-off-by: Eduardo <[email protected]>
@astrojuanlu
Copy link
Member

Woah, it's trying to install Dask 0.8.2, released 2016... https://pypi.org/project/dask/0.8.2/ something very weird going on

@merelcht
Copy link
Member

Woah, it's trying to install Dask 0.8.2, released 2016... https://pypi.org/project/dask/0.8.2/ something very weird going on

Dask just released a new version: https://pypi.org/project/dask/2024.3.0/#history, pinning it seems to at least fix the installation step but there's other stuff wrong as well: #604

@astrojuanlu
Copy link
Member

Never ending story

@astrojuanlu
Copy link
Member

@eromerobilbomatica now the yaml.org site disappeared and our links broke, we're waiting a few hours to see if the problem fixes itself

@astrojuanlu astrojuanlu enabled auto-merge (squash) March 16, 2024 21:00
@astrojuanlu astrojuanlu merged commit 2cbb71e into kedro-org:main Mar 16, 2024
47 checks passed
tgoelles pushed a commit to tgoelles/kedro-plugins that referenced this pull request Jun 6, 2024
* issue kedro-org#574

* Chance to kedro-org#574 issue

Signed-off-by: Eduardo <[email protected]>

* lint tool implementation fixes

Signed-off-by: eduardo <[email protected]>

* lint tool implementation fixes

Signed-off-by: eduardo <[email protected]>

* lint tool implementation fixes for the second time

Signed-off-by: eduardo <[email protected]>

* Fixed attribute name typo in HFDataset's _load() method, now correctly accessing self._dataset_kwargs instead of self._pipeline_kwargs.

* corrections after applying the `make plugin=kedro-datasets lint`

Signed-off-by: eduardo <[email protected]>

* Update RELEASE.md

I add myself as as contributor.

Signed-off-by: Eduardo <[email protected]>

* Update RELEASE.md

Add note about the fixed bug.

Signed-off-by: Eduardo <[email protected]>

* Update RELEASE.md

Fixed my comments and its position in the section.

Signed-off-by: Eduardo <[email protected]>

* Update RELEASE.md

Changed my name to community contributions in the "Upcoming Release" section

Signed-off-by: Eduardo <[email protected]>

---------

Signed-off-by: Eduardo <[email protected]>
Signed-off-by: eduardo <[email protected]>
Signed-off-by: Eduardo <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
Co-authored-by: eduardo <[email protected]>
Co-authored-by: Nok Lam Chan <[email protected]>
Co-authored-by: Juan Luis Cano Rodríguez <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: tgoelles <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The kedro_datasets.huggingface.HFDataset class have to accept parameters optionals in call of the function
7 participants