Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add PartitionField #4590

Merged
merged 4 commits into from
Apr 25, 2022
Merged

Python: Add PartitionField #4590

merged 4 commits into from
Apr 25, 2022

Conversation

dramaticlly
Copy link
Contributor

@dramaticlly dramaticlly commented Apr 19, 2022

This is 1st step of reintroduce #3228

adds PartitionField in iceberg/partitioning.py

>>> from iceberg.partitioning import PartitionField
>>> partition_field = PartitionField(3, 1, None, "id")
>>> partition_field
PartitionField(field_id=1, name=id, transform=None, source_id=3)

Purposefully keep the change list small to focus only on PartitionField and once checked in we can look into PartitionSpec next.

CC @samredai @rdblue @jun-he

Copy link
Collaborator

@samredai samredai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, I suggested a few changes. Also I'm wondering if this should go into the table sub-module, I was thinking table/partition.py.

python/src/iceberg/partitioning.py Outdated Show resolved Hide resolved
python/src/iceberg/partitioning.py Outdated Show resolved Hide resolved
from iceberg.transforms import Transform


class PartitionField:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue do we need to define a hash method for this class? Maybe something like:

def __hash__(self):
    return hash((self.source_id, self.field_id, self.name, self.transform))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's mainly used in list or object array from what I can tell, but please let me know if I should add it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the policy for adding hash across python classes, but maybe it will be useful as PartitionField is part of Iceberg public api ,(example if a user tries to make dicts keyed by partitionSpec?) Java version has it, for comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was late to see this, but we don't need __hash__ implementations unless the objects are being used for keys in dicts.

@dramaticlly
Copy link
Contributor Author

@samredai added hash and moved partitioning to the table module, if you could take another look :)

@dramaticlly
Copy link
Contributor Author

@rdblue and @samredai updated to address review feedback, appreciate another look :)

@rdblue rdblue merged commit c1b553d into apache:master Apr 25, 2022
@rdblue
Copy link
Contributor

rdblue commented Apr 25, 2022

Looks great. Thanks, @dramaticlly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants