Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Multivalued Copy Options in DFParser #9274

Closed
devinjdangelo opened this issue Feb 19, 2024 · 2 comments
Closed

Add Support for Multivalued Copy Options in DFParser #9274

devinjdangelo opened this issue Feb 19, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@devinjdangelo
Copy link
Contributor

devinjdangelo commented Feb 19, 2024

Is your feature request related to a problem or challenge?

The partition_by COPY option is multivalued, e.g.:

COPY table to file.parquet (partition_by 'a,b,c')

This is handled currently by passing a comma separated string literal to the COPY statement which is parsed later during planning by splitting on the comma. The current parsing is not as robust at handling edge cases (e.g. it won't handle a column name which itself contains a comma).

Other systems (e.g. DuckDB), have a special syntax for partition_by option https://duckdb.org/docs/data/partitioning/partitioned_writes.html:

COPY table to file.parquet (partition_by (a,b,c))

We could support this same syntax with parser updates.

Describe the solution you'd like

Add support for multivalued COPY options in DFParser. E.g.

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct CopyToStatement {
    /// From where the data comes from
    pub source: CopyToSource,
    /// The URL to where the data is heading
    pub target: String,
    /// Target specific options
    pub options: Vec<(String, CopyToOptionValue)>,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum CopyToOptionValue {
    /// A single [Value], e.g. (format parquet)
    Single(Value),
    /// A list of [Value]s, e.g. (partition_by ("a", "b", "c"))
    List(Vec<String>),
}

pub fn parse_option_value(&mut self) -> Result<CopyToOptionValue, ParserError> {
  let next_token = self.parser.peek_token();
  match next_token.token {
      Token::Word(Word { value, .. }) => {
          self.parser.next_token();
          Ok(CopyToOptionValue::Single(Value::UnQuotedString(value)))
      },
      Token::SingleQuotedString(s) => {
          self.parser.next_token();
          Ok(CopyToOptionValue::Single(Value::SingleQuotedString(s)))
      },
      Token::DoubleQuotedString(s) => {
          self.parser.next_token();
          Ok(CopyToOptionValue::Single(Value::DoubleQuotedString(s)))
      },
      Token::EscapedStringLiteral(s) => {
          self.parser.next_token();
          Ok(CopyToOptionValue::Single(Value::EscapedStringLiteral(s)))
      },
      Token::Number(ref n, l) => {
          self.parser.next_token();
          match n.parse() {
              Ok(n) => Ok(CopyToOptionValue::Single(Value::Number(n, l))),
              // The tokenizer should have ensured `n` is an integer
              // so this should not be possible
              Err(e) => parser_err!(format!(
                  "Unexpected error: could not parse '{n}' as number: {e}"
              )),
          }},
      Token::LParen => {
          Ok(CopyToOptionValue::List(self.parse_partitions()?))
      },
      _ => self.parser.expected("string or numeric value", next_token),
  }
}

The CopyTo logical plan will also need to be updated to accept multi valued options. This will require a good amount of work to rewire the code to handle the possibility of multi valued options.

Describe alternatives you've considered

Keep the parser and logical plan as-is. Partitioning by columns containing commas in their name may be a rare enough special case that we can simply not support it.

Additional context

No response

@alamb
Copy link
Contributor

alamb commented Mar 24, 2024

After #9382 and the introduction of PARTITION BY ... syntax, I think this ticket might be completed

@devinjdangelo
Copy link
Contributor Author

After #9382 and the introduction of PARTITION BY ... syntax, I think this ticket might be completed

I agree solved by other means

@alamb alamb closed this as completed Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants