Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling load_table().scan().to_arrow() emits error on empty "names" field in name mapping #925

Closed
spock-abadai opened this issue Jul 13, 2024 · 2 comments · Fixed by #927

Comments

@spock-abadai
Copy link
Contributor

Apache Iceberg version

main (development)

Please describe the bug 🐞

On one of my iceberg tables, when I load a table and scan it, during the parsing of the name mapping in the table properties, pydantic issues the following ValidationError:

    def parse_mapping_from_json(mapping: str) -> NameMapping:
>       return NameMapping.model_validate_json(mapping)
E       pydantic_core._pydantic_core.ValidationError: 1 validation error for NameMapping
E       9.names
E         Value error, At least one mapped name must be provided for the field [type=value_error, input_value=[], input_type=list]
E           For further information visit https://errors.pydantic.dev/2.8/v/value_error

This seems to be a result of the code in table/name_mapping.py in the method check_at_least_one, which (if I understand correctly) checks that all fields in the name mapping have at least one name. However, if I'm reading the Iceberg spec correctly, it states that:

image

I'm not 100% sure what scenario lead to this but I can say that the name mapping we have indeed has a field with id 10 that has an empty list of names. This field existed at one point in the schema but it seems like it was removed. In any case, it doesn't seem like requiring that the list of names contain at least one value is in line with the spec (and it seems that situations where this isn't the case do happen).

Note that the said iceberg table was never created, written to or modified using pyiceberg (only using spark and trino). pyiceberg is only used to read.

@Fokko Fokko added this to the PyIceberg 0.7.0 release milestone Jul 13, 2024
@Fokko
Copy link
Contributor

Fokko commented Jul 13, 2024

@spock-abadai Thanks for reporting this. I agree that this seems to be incorrect. Are you interested in providing a PR?

@spock-abadai
Copy link
Contributor Author

@spock-abadai Thanks for reporting this. I agree that this seems to be incorrect. Are you interested in providing a PR?

Sure, see #927 for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants