Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COPY TO/FROM is overly restrictive on options #1085

Open
Jefffrey opened this issue Jan 4, 2024 · 4 comments
Open

COPY TO/FROM is overly restrictive on options #1085

Jefffrey opened this issue Jan 4, 2024 · 4 comments

Comments

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 4, 2024

Current parsing of COPY TO/FROM options is too restrictive on the how the keys & values can be provided.

https://github.com/sqlparser-rs/sqlparser-rs/blob/ce498864dc705f72e9f85d4dc5f7eba3d17b9ef6/src/parser/mod.rs#L5157-L5180

Given this parsing code, the expected format is:

COPY (SELECT 1 AS a, 2 AS b) TO 'file'
WITH (
  FORMAT 'text',
  FREEZE true,
  DELIMITER 'd',
  NULL 'null',
  HEADER true,
  QUOTE 'q',
  ESCAPE 'e',
  FORCE_QUOTE (a, b),
  FORCE_NOT_NULL (a, b),
  FORCE_NULL (a, b),
  ENCODING 'encoding'
)

The problems:

  • Key must be a keyword and cannot be a quoted string, even though postgres allows this, i.e. WITH ("format" 'text', ...)
  • It strictly limits the key set at parsing time, leading to further problems:
  • Some values are too strict in their format:
    • FORMAT expects an identifier, which disallows escaped strings (format e'csv'), but this is allowed by postgres
    • FREEZE expects keywords true or false but postgres allows these to be passed as string literals (single quoted, double quoted, escaped), e.g. freeze 'true'
      • Same for HEADER

Solutions

We could fix these problems for Postgres specifically to make it more permissive, but I was hoping to also fix it to allow usage by other engine dialects (e.g. duckdb and datafusion). This might involve removing the CopyOption enum entirely:

https://github.com/sqlparser-rs/sqlparser-rs/blob/ce498864dc705f72e9f85d4dc5f7eba3d17b9ef6/src/ast/mod.rs#L4707-L4736

And replacing its usage in Statement::Copy with a more generic version that allows any string key with a new value enum to represent different values (e.g. string value, parenthesized column list value, number value):

https://github.com/sqlparser-rs/sqlparser-rs/blob/ce498864dc705f72e9f85d4dc5f7eba3d17b9ef6/src/ast/mod.rs#L1466-L1478

I'm not sure how the legacy options will fit into this, it could just be left as is and probably tackled as part of another ticket (as the original reason here was for interop with datafusion copy statement)

@Jefffrey
Copy link
Contributor Author

Jefffrey commented Jan 4, 2024

I'm currently working on this, but I wanted to gather opinions on how to properly tackle this, especially in regards to how much backward compatibility (if any) we should strive to preserve. I was initially thinking of removing CopyOption enum entirely as it feels problematic, but it would be a breaking change.

Thoughts @alamb ?

@alamb
Copy link
Contributor

alamb commented Jan 4, 2024

Rather than removing CopyOption entirely, what about making an extension or other variant in CopyOption that is like CopyOption::Other(Name, Value) or soemthing?

Any changes like this will be a breaking change -- I think a key requirement is that sqlparser continues to be able to parse queries it is currently able to (doens't start rejecting queries it used to parse)

I think a very valuable property is that the AST that comes out is as typed as possible -- e.g. Copy options probably shouldn't just be a generic set of key/value pairs

@alamb
Copy link
Contributor

alamb commented Jan 4, 2024

Another potential thought might be to keep the current postgres specific copy options but rename it to PostgresCopy and only parse with the postgres dialect, and then make a more generic node for the Generic dialect (and we can add a DuckDB specific one too if we wanted)

@Jefffrey
Copy link
Contributor Author

Jefffrey commented Jan 4, 2024

I think a key requirement is that sqlparser continues to be able to parse queries it is currently able to (doens't start rejecting queries it used to parse)

This should be fine. In fact, it should be more permissive now. It's more about library API breaking changes.

Rather than removing CopyOption entirely, what about making an extension or other variant in CopyOption that is like CopyOption::Other(Name, Value) or soemthing?

Yeah this is a possibility, as downstream users can then ignore the Postgres specific enum variants (though might be undesirable to have this baggage).

I think a very valuable property is that the AST that comes out is as typed as possible -- e.g. Copy options probably shouldn't just be a generic set of key/value pairs

I agree that more typing is better, but in this case the option keyset seems less part of the SQL syntax and more part of engine semantics. Although, postgres does throw an error when it doesn't recognize an option with the error pointing to the SQL so maybe I'm wrong on that:

postgres=# copy (select 1 as a, 2 as b, 3 as c) to '/tmp/123' with (format e'csv', delimiter a, freeze e'true', null test, test123 test);
ERROR:  option "test123" not recognized
LINE 1: ...t e'csv', delimiter a, freeze e'true', null test, test123 te...
  • Also having the keyset be unrestricted/generic and not typed to enum might make it easier for downstream users to support more/new keys (since they can implement the key/value parsing themselves), but perhaps this doesn't change often enough to warrant being a high priority

Another potential thought might be to keep the current postgres specific copy options but rename it to PostgresCopy and only parse with the postgres dialect, and then make a more generic node for the Generic dialect (and we can add a DuckDB specific one too if we wanted)

This is probably the better approach, so users won't have to consider postgres specific options when they are checking the copy options. I'll probably explore in this direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants