You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seems like there are some internal maps (e.g. Schema.nameToId) that use plain strings as keys. These strings get built through joining nested fields using dots. This makes field nested under dot and a non-nested field dot.field result in the same key, causing the above test to fail.
I started looking at what would be needed to resolve this. Ideally the key isn't a string, but maybe a list of strings that represents different levels of nesting, or something similar. Otherwise, the dots should be escaped and unescaped when dealing with fields that actually have a dot in the name.
The main issue with this is that there are already quite a few methods that implicitly assume the field name doesn't have a dot inside. Some examples: deleteColumn, moveAfter and basically anything on UpdateSchema that doesn't accept a parent field (though even then, what if the parent field has a dot inside?). So more methods (or another interface with said methods) would need to be added in order to support the use case of a dot inside field names, in order to not break existing code.
Edit: I see this is actually a known limitation and s similar solution as suggested above was already discussed, but decided against for the sake of simplicity. IMO this limitation should still be documented in the spec instead of only in error messages.
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
1.6.0 (latest release)
Query engine
None
Please describe the bug 🐞
I've reproduced this with the following test added to
TestSchemaUpdate.java
:Seems like there are some internal maps (e.g.Schema.nameToId
) that use plain strings as keys. These strings get built through joining nested fields using dots. This makesfield
nested underdot
and a non-nested fielddot.field
result in the same key, causing the above test to fail.I started looking at what would be needed to resolve this. Ideally the key isn't a string, but maybe a list of strings that represents different levels of nesting, or something similar. Otherwise, the dots should be escaped and unescaped when dealing with fields that actually have a dot in the name.The main issue with this is that there are already quite a few methods that implicitly assume the field name doesn't have a dot inside. Some examples:deleteColumn
,moveAfter
and basically anything onUpdateSchema
that doesn't accept a parent field (though even then, what if the parent field has a dot inside?). So more methods (or another interface with said methods) would need to be added in order to support the use case of a dot inside field names, in order to not break existing code.Edit: I see this is actually a known limitation and s similar solution as suggested above was already discussed, but decided against for the sake of simplicity. IMO this limitation should still be documented in the spec instead of only in error messages.
Willingness to contribute
The text was updated successfully, but these errors were encountered: