-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Add PartitionField #4590
Conversation
db7ee00
to
efc8d31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, I suggested a few changes. Also I'm wondering if this should go into the table sub-module, I was thinking table/partition.py
.
python/src/iceberg/partitioning.py
Outdated
from iceberg.transforms import Transform | ||
|
||
|
||
class PartitionField: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue do we need to define a hash method for this class? Maybe something like:
def __hash__(self):
return hash((self.source_id, self.field_id, self.name, self.transform))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's mainly used in list or object array from what I can tell, but please let me know if I should add it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the policy for adding hash across python classes, but maybe it will be useful as PartitionField is part of Iceberg public api ,(example if a user tries to make dicts keyed by partitionSpec?) Java version has it, for comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was late to see this, but we don't need __hash__
implementations unless the objects are being used for keys in dicts.
@samredai added hash and moved |
Looks great. Thanks, @dramaticlly! |
This is 1st step of reintroduce #3228
adds
PartitionField
iniceberg/partitioning.py
Purposefully keep the change list small to focus only on PartitionField and once checked in we can look into PartitionSpec next.
CC @samredai @rdblue @jun-he