-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NameMapping flattens the names and causes a.b
field to collide with child b
field of field a
#935
Comments
Thanks for tracking this @syun64, can I pick this one up? :) |
Yes of course! Just a note that's hopefully helpful: while working on covering more cases for #921 , I realized this may require a bit more work than I originally thought. We currently rely on a flat name mapping in many places throughout the repository, including when we aggregate stats from the parquet files: iceberg-python/pyiceberg/io/pyarrow.py Lines 2027 to 2031 in 0f2e19e
So I think we will need to build a tree representation of the Name to ID mapping for a given pyarrow schema as well. iceberg-python/pyiceberg/io/pyarrow.py Lines 1934 to 1936 in 0f2e19e
|
Apache Iceberg version
None
Please describe the bug 🐞
According to the Iceberg documentation on Column Projection:
The current implementation of NameMapping flattens the name by joining the parent child relationships with a
.
. This causes name collisions issues with fields that should not collide with each other.For example, this flat map causes
a.b
field to collide with childb
field of fielda
.We should update
_field_by_name()
andfind()
methods of NameMapping to use a tree structure instead of a flat dict, and traverse the tree in order to retrieveMappedField
of the provided name.iceberg-python/pyiceberg/table/name_mapping.py
Lines 73 to 82 in e27cd90
The text was updated successfully, but these errors were encountered: