Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwritten Format Configs by CreateExternalTable Options #9945

Closed
berkaysynnada opened this issue Apr 4, 2024 · 5 comments · Fixed by #10404
Closed

Overwritten Format Configs by CreateExternalTable Options #9945

berkaysynnada opened this issue Apr 4, 2024 · 5 comments · Fixed by #10404
Labels
bug Something isn't working

Comments

@berkaysynnada
Copy link
Contributor

berkaysynnada commented Apr 4, 2024

Describe the bug

let file_format: Arc<dyn FileFormat> = match file_type {
FileType::CSV => {
let mut csv_options = table_options.csv;
csv_options.has_header = cmd.has_header;
csv_options.delimiter = cmd.delimiter as u8;
csv_options.compression = cmd.file_compression_type;
Arc::new(CsvFormat::default().with_options(csv_options))

These lines of ListingTableFactory.create() overwrites the config options by cmd (CreateExternalTable options). Configs coming from OPTIONS('format... are discarded silently.

To Reproduce

statement ok
CREATE EXTERNAL TABLE aggregate_simple (
  c1 FLOAT NOT NULL,
  c2 DOUBLE NOT NULL,
  c3 BOOLEAN NOT NULL
)
STORED AS CSV
LOCATION '../core/tests/data/aggregate_simple.csv'
WITH HEADER ROW

query RRB
SELECT * FROM aggregate_simple LIMIT 3;
----

works as expected, but

statement ok
CREATE EXTERNAL TABLE aggregate_simple (
  c1 FLOAT NOT NULL,
  c2 DOUBLE NOT NULL,
  c3 BOOLEAN NOT NULL
)
STORED AS CSV
LOCATION '../core/tests/data/aggregate_simple.csv'
OPTIONS('format.has_header' 'true')

query RRB
SELECT * FROM aggregate_simple LIMIT 3;
----

does not, since CREATE EXTERNAL TABLE does not have WITH HEADER ROW, which overwrites csv.has_header to false.

Expected behavior

The actual problem is that we can set the same settings by 2 different commands now, and one of them is silently chosen. Their default values are also different (CreateExternalTable's false, CsvOptions' true). We can set both of them as false initially. Then if one of them is true, then we expect a header. If WITH HEADER ROW exists and 'format.has_header' 'false', we can give an error.

Additional context

No response

@berkaysynnada berkaysynnada added the bug Something isn't working label Apr 4, 2024
@berkaysynnada
Copy link
Contributor Author

CreateExternalTable has some CSV specific fields:

/// Whether the CSV file contains a header
pub has_header: bool,
/// Delimiter for CSV
pub delimiter: char,

After the new way of setting the configs #9382, I believe we should eliminate them now. CreateExternalTable should have fields applying all type of formats. Format specific options are given with OPTIONS() part.

@alamb
Copy link
Contributor

alamb commented Apr 6, 2024

Is there any way to retain the WITH HEADER ROW syntax for backwards compatibility? For example, in the parser we could distinguish between WITH HEADER ROW, WITH NO HEADER ROW and not specified.

@berkaysynnada
Copy link
Contributor Author

Is there any way to retain the WITH HEADER ROW syntax for backwards compatibility? For example, in the parser we could distinguish between WITH HEADER ROW, WITH NO HEADER ROW and not specified.

That might be a good idea. Converting has_header: bool to has_header: Option<bool> in both CsvOptions and CreateExternalTable will solve the problem, and sustain the backwards compatibility. We can sense the conflicting options and keep the default behavior.

@alamb
Copy link
Contributor

alamb commented Apr 15, 2024

Given how long WITH HEADER ROW has been around, I think removing it now would be really disruptive on users

@alamb
Copy link
Contributor

alamb commented May 7, 2024

Sorry if my comments here were misinterpreted to mean that DELIMITER and COMPRESSION were not important to maintain. I think all the existing syntax should be maintained until we have a larger conversation about removing them

I am happy to help facilitate such a conversation if that would help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants