Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename uses of data_set to dataset #3115

Merged

Conversation

MinuraPunchihewa
Copy link
Contributor

@MinuraPunchihewa MinuraPunchihewa commented Oct 3, 2023

Description

This PR renames data_sets argument to datasets and the _data_sets attribute to _datasets in the Catalog implementations. It also renames the data_sets() function in Pipeline to datasets().

Development notes

The following changes have been made:

  • Renamed data_sets argument to datasets and the _data_sets attribute to _datasets in the Catalog.
  • Renamed the data_sets() function in Pipeline to datasets().
  • Renamed all other usages of data_sets in the code base to datasets.
  • Renamed uses data_set throughout the project to dataset.

Fixes:

#3041

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@MinuraPunchihewa
Copy link
Contributor Author

Hey @astrojuanlu,
I believe I have made the necessary changes here. Am I also required to rename uses of data_set (singular) to dataset? My initial thoughts are that this will not have a functional impact per se, but I can make the changes if necessary.

@stichbury
Copy link
Contributor

Hey @MinuraPunchihewa , thanks for your contribution!

I'll leave @astrojuanlu to comment on the changes but I can see already that, to be able to merge it, we would need you to signoff your commits. This is a verification on your side that you're complying with our Developer Certificate of Origin.

Please can you fix the commits in question. You can use git commit --amend --no-edit --signoff as needed.

@astrojuanlu
Copy link
Member

Thanks @MinuraPunchihewa ! Apart from what @stichbury said, on the data_set singular, for consistency let's do it. I assume it's mostly variable names, but better to do it now.

@MinuraPunchihewa
Copy link
Contributor Author

Hey @stichbury,
Thank you for letting me know. Do I need to sign off on all the commits or just the last one? I think the --ammend flag will only let me update the last one. Will I have to run a rebase?
Sorry to trouble you with these questions, this is the first time I've come across this.

I don't know if I have missed it, but would it be useful for this to be mentioned in the contributing guidelines?

Sure, @astrojuanlu, I will make the necessary changes.

@stichbury
Copy link
Contributor

Hey @stichbury, Thank you for letting me know. Do I need to sign off on all the commits or just the last one? I think the --ammend flag will only let me update the last one. Will I have to run a rebase? Sorry to trouble you with these questions, this is the first time I've come across this.

And we are sorry to have to ask you to do this! It's a requirement from our open source organization and you're not the first to hit against it. I think a rebase would work best in this case.

I don't know if I have missed it, but would it be useful for this to be mentioned in the contributing guidelines?
That's a good point. It is there (https://github.com/kedro-org/kedro/wiki/Guidelines-for-contributing-developers#developer-certificate-of-origin) but I'm going to make a change to the PR template to highlight it more clearly. Thanks for flagging it.

@MinuraPunchihewa
Copy link
Contributor Author

It's no problem at all, @stichbury. I'm sorry, I should have been aware.

@astrojuanlu
Copy link
Member

Thank you for letting me know. Do I need to sign off on all the commits or just the last one? I think the --ammend flag will only let me update the last one. Will I have to run a rebase?

Indeed, git commit --amend --signoff will only fix the last one, do git rebase --signoff instead

…TestSparkDataSet and TestDataFlowSequentialRunner

Signed-off-by: Minura Punchihewa <[email protected]>
…sPipeline and TestSequentialRunnerRelease

Signed-off-by: Minura Punchihewa <[email protected]>
@MinuraPunchihewa
Copy link
Contributor Author

Hey @stichbury, @astrojuanlu,
I believe I was able to sign off on the previous commits. I had to do,

git rebase --signoff HEAD~18
git push -f

I believe I was also able to rename uses of data_set throughout the code base to dataset. I did not change anything in the RELEASE.md file, however, because I assume the history of the changes made should be preserved.

By the way, guys, this is the most number of files I have ever edited in a single PR!

Please let me know if there is anything else I need to do. I will open up this PR for review now.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MinuraPunchihewa, thank you for this PR. This is breaking change and therefore should go in the develop branch and not the main.

@merelcht merelcht linked an issue Oct 5, 2023 that may be closed by this pull request
3 tasks
@MinuraPunchihewa MinuraPunchihewa changed the base branch from main to develop October 5, 2023 10:22
@MinuraPunchihewa
Copy link
Contributor Author

Hey @merelcht,
I have changed the base branch. Looks like there are some conflicts though..

@merelcht
Copy link
Member

merelcht commented Oct 5, 2023

Hey @merelcht, I have changed the base branch. Looks like there are some conflicts though..

Oof that's unfortunate.. the conflicts are all with files that have been removed on develop. It might actually be easier to re-create the PR against develop 😬

Signed-off-by: Minura Punchihewa <[email protected]>
@MinuraPunchihewa
Copy link
Contributor Author

Oof that's unfortunate.. the conflicts are all with files that have been removed on develop. It might actually be easier to re-create the PR against develop 😬

Ah.. that is unfortunate. Have all of these files been removed? If so, I can remove them from my branch, which might resolve the conflicts. What do you think? I already pushed one commit and it seems to work.

@merelcht
Copy link
Member

merelcht commented Oct 5, 2023

Ah.. that is unfortunate. Have all of these files been removed? If so, I can remove them from my branch, which might resolve the conflicts. What do you think? I already pushed one commit and it seems to work.

Yes all files in kedro/extras/datasets/ have been removed. You can definitely try and see if that solves it!

@MinuraPunchihewa
Copy link
Contributor Author

Yes all files in kedro/extras/datasets/ have been removed. You can definitely try and see if that solves it!

Thanks, @merelcht. I will give it a shot.

@MinuraPunchihewa
Copy link
Contributor Author

Hey @merelcht, @astrojuanlu,
I believe I have been able to address the merge conflicts.

@merelcht merelcht self-requested a review October 5, 2023 16:18
@merelcht
Copy link
Member

merelcht commented Oct 5, 2023

Hey @merelcht, @astrojuanlu, I believe I have been able to address the merge conflicts.

Fantastic!

The renaming changes all look good to me, but they should definitely be mentioned in the RELEASE.md notes. All the user facing API changes should be explicitly mentioned so users know what to do when they upgrade. On top of the renames you mentioned on this PR, the change from create_default_data_set() to create_default_dataset() in the runner.py should be pointed out.

@MinuraPunchihewa
Copy link
Contributor Author

MinuraPunchihewa commented Oct 5, 2023

@merelcht Am I right to assume that these changes should be mentioned under the Upcoming Release 0.18.14? And under the sub section Breaking changes to the API?

Signed-off-by: Minura Punchihewa <[email protected]>
@MinuraPunchihewa
Copy link
Contributor Author

@merelcht I've added the release notes the best I knew how. Let me know if anything needs to change.

@merelcht
Copy link
Member

merelcht commented Oct 6, 2023

@merelcht I've added the release notes the best I knew how. Let me know if anything needs to change.

Thanks @MinuraPunchihewa ! They actually need to go under the section 0.19.0, this is because it's a breaking change and they will be going into develop. Is it okay if I commit some changes to fix the notes and linting?

@MinuraPunchihewa
Copy link
Contributor Author

Thanks @MinuraPunchihewa ! They actually need to go under the section 0.19.0, this is because it's a breaking change and they will be going into develop. Is it okay if I commit some changes to fix the notes and linting?

Of course, @merelcht! Go right ahead. Do you want me to fix the release notes?

Signed-off-by: Merel Theisen <[email protected]>
@merelcht
Copy link
Member

merelcht commented Oct 6, 2023

Of course, @merelcht! Go right ahead. Do you want me to fix the release notes?

I have done this and the linter is happy now as well! There's just one build failing, but that's caused by something else. We're working on fixing that.

Thank you so much for this contribution @MinuraPunchihewa

@deepyaman deepyaman changed the title Rename Uses of data_sets to dataset Rename uses of data_set to dataset Oct 8, 2023
Copy link
Member

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

RELEASE.md Show resolved Hide resolved
@merelcht merelcht enabled auto-merge (squash) October 9, 2023 08:39
Signed-off-by: Merel Theisen <[email protected]>
@merelcht merelcht merged commit 3f2a226 into kedro-org:develop Oct 9, 2023
50 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename data_sets argument and _data_sets attribute of DataCatalog
5 participants