Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: Parquet writer option(s) ['write.parquet.row-group-size-bytes'] not implemented #1013

Closed
djouallah opened this issue Aug 7, 2024 · 5 comments · Fixed by #1016

Comments

@djouallah
Copy link

Apache Iceberg version

0.7.0 (latest release)

Please describe the bug 🐞

it was working fine, and today, I got this ? using Tabular as a catalog

/usr/local/lib/python3.10/dist-packages/pydantic/main.py:415: UserWarning: Pydantic serializer warnings:
  Expected `TableIdentifier` but got `dict` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_json(
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<timed exec> in <module>

[/usr/local/lib/python3.10/dist-packages/pyiceberg/table/__init__.py](https://localhost:8080/#) in append(self, df, snapshot_properties)
   1572         """
   1573         with self.transaction() as tx:
-> 1574             tx.append(df=df, snapshot_properties=snapshot_properties)
   1575 
   1576     def overwrite(

3 frames
[/usr/local/lib/python3.10/dist-packages/pyiceberg/io/pyarrow.py](https://localhost:8080/#) in _get_parquet_writer_kwargs(table_properties)
   2288     ]:
   2289         if unsupported_keys := fnmatch.filter(table_properties, key_pattern):
-> 2290             raise NotImplementedError(f"Parquet writer option(s) {unsupported_keys} not implemented")
   2291 
   2292     compression_codec = table_properties.get(TableProperties.PARQUET_COMPRESSION, TableProperties.PARQUET_COMPRESSION_DEFAULT)

NotImplementedError: Parquet writer option(s) ['write.parquet.row-group-size-bytes'] not implemented
@Fokko
Copy link
Contributor

Fokko commented Aug 7, 2024

@djouallah Thanks for raising this. For context, there was a bug where it would pass down the write.parquet.row-group-size-bytes, but it actually only allows passing down the number of records in a row group. Let me dig into this.

@Fokko
Copy link
Contributor

Fokko commented Aug 7, 2024

Sorry for the inconvenience here. I've created a fix that we'll backport to the 0.7.1 branch

@djouallah
Copy link
Author

same error with 0.7.1 rc1 ?

@sungwy
Copy link
Collaborator

sungwy commented Aug 10, 2024

Hi @djouallah - could you try using the property write.parquet.row-group-limit instead? Unfortunately write.parquet.row-group-size-bytes isn't a supported property in PyIceberg:

for key_pattern in [
TableProperties.PARQUET_ROW_GROUP_SIZE_BYTES,
TableProperties.PARQUET_BLOOM_FILTER_MAX_BYTES,
f"{TableProperties.PARQUET_BLOOM_FILTER_COLUMN_ENABLED_PREFIX}.*",
]:
if unsupported_keys := fnmatch.filter(table_properties, key_pattern):
raise NotImplementedError(f"Parquet writer option(s) {unsupported_keys} not implemented")

@djouallah
Copy link
Author

ah, I see , thank you , it was the catalog for some reason who added all those properties, all good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants