Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParquetSink: restore ability to provide additional user metadata into the encoded parquet file. #10223

Closed
wiedld opened this issue Apr 24, 2024 · 0 comments · Fixed by #10224
Closed
Labels
bug Something isn't working

Comments

@wiedld
Copy link
Contributor

wiedld commented Apr 24, 2024

Describe the bug

We add our own metadata to the parquet file. Currently, we do so using the WriterProperties' kv_metadata and the ArrowWriter. We want to start performing parquet writes with datafusion's ParquetSink, however a recent change has removed this ability to add our own metadata.

There was a change to unify the different writer options across sink types, specifically to make COPY TO and create external table have a uniform configuration. Users can now specify the configuration at the SQL level API (e.g. COPY <src> TO <sink> (<config_options>)). This was a good high level change; however, a side effect of the implementation was the removal of the ability to add our own metadata.

The current implementation (after the above change) now derives the writer properties from the TableParquetOptions. This conversion always sets the sorting_columns and user-defined kv_metadata as None, as demonstrated in the first commit of the fix.

To Reproduce

The hardcoded setting of the user metadata to None is demonstrated in this commit.

Expected behavior

The expected behavior is to be able to set our own metadata. Ideally, to have user-inserted metadata as an option at the SQL level API.

COPY source_table TO 'sink' STORED AS PARQUET OPTIONS ('format.metadata' 'key:value')

The expected outcome is demonstrated in this commit.

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant