Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing SQL strings to Exprs #8736

Closed
jiacai2050 opened this issue Jan 3, 2024 · 15 comments · Fixed by #10995
Closed

Support parsing SQL strings to Exprs #8736

jiacai2050 opened this issue Jan 3, 2024 · 15 comments · Fixed by #10995
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jiacai2050
Copy link
Contributor

jiacai2050 commented Jan 3, 2024

Part of #9494

Is your feature request related to a problem or challenge?

Currently when we need to construct an Expr, we need to build it by hand, take a > 1 for example:

Expr::BinaryExpr(datafusion::logical_expr::BinaryExpr::new(
    Box::new(col("a")),
    Operator::Gt,
    Box::new(lit(1)),
))

Although this works, it becomes tedious when we need to construct complex expr like a > 1 and b in (1,10) ....

Describe the solution you'd like

Impl FromStr for Expr, so we can get an Expr from String directly.

Describe alternatives you've considered

No response

Additional context

No response

@jiacai2050 jiacai2050 added the enhancement New feature or request label Jan 3, 2024
@alamb
Copy link
Contributor

alamb commented Jan 3, 2024

Also related #7165

I think this also came up a few days ago but I can't find the reference (maybe @devinjdangelo remembers 🤔 )

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 3, 2024

Also relates to #4890 I believe

@devinjdangelo
Copy link
Contributor

devinjdangelo commented Jan 3, 2024

Also related #7165

I think this also came up a few days ago but I can't find the reference (maybe @devinjdangelo remembers 🤔 )

#8661 perhaps? Though that is talking about going in the opposite direction.

@jiacai2050
Copy link
Contributor Author

If maintainers thinks this is a reasonable request, and no one is working on this, I will draft a PR in next few days.

@alamb
Copy link
Contributor

alamb commented Jan 4, 2024

Making exprs from strings sounds like a good idea to me.

BTW you can also use the more fluent style API to make exprs, which while still tedious is still a bit better:

Rather than

Expr::BinaryExpr(datafusion::logical_expr::BinaryExpr::new(
    Box::new(col("a")),
    Operator::Gt,
    Box::new(lit(1)),
))

You can write

col("a").gt(lit(1))

The benfit of this approach over strings is that the compiler can check / ensure the exprs are constructed correctly. However, I think the usecase of usign strings to, for example, serialize Exprs and let users provide arbitrary filters makes a String --> Expr API still useful

@alamb alamb added the help wanted Extra attention is needed label Mar 7, 2024
@alamb
Copy link
Contributor

alamb commented Mar 7, 2024

I think you can do this today via SqlToRel::sql_to_expr

Step 1: Add an example to show how to use SqlToRel::sql_to_expr

So add an example (perhaps expr_to_sql.rs) in
https://github.com/apache/arrow-datafusion/tree/main/datafusion-examples/examples that showed how to call SqlToRel to parser the expression above

Step 2: Add an API to DataFusion that made parsing SQL easier

The idea is to avoid requiring PlannerContext, a DFSchema, etc).

For example:

let expr: Expr = parse_expr("c < 5")?;

Note this doesn't provide a schema.

@alamb alamb changed the title Generate an Expr from string Support parsing SQL strings to Exprs Mar 7, 2024
@Omega359
Copy link
Contributor

Omega359 commented Mar 10, 2024

    let parser_options = ParserOptions {
        enable_ident_normalization: false,
        parse_float_as_decimal: false,
    };
    let table_provider = norm.session_context.table_provider(norm.table_name.clone()).wait().unwrap();
    let table_schema_provider = TableSchemaProvider::new(norm.table_name.clone(), table_provider.as_ref());
    let sql_to_rel = SqlToRel::new_with_options(&table_schema_provider, parser_options);
    let sql_to_expr = sql_to_rel.sql_to_expr(expr.clone(), df.schema(), &mut PlannerContext::new());

    if sql_to_expr.is_err() {
        error!("Unable to transform sql expression '{}' to datafusion Expr: {:?}", &expression, sql_to_expr.err());
        panic!();
    }

    sql_to_expr.unwrap()

This is what I have been using in my POC (still a WIP, ignore the panic :) )

@alamb
Copy link
Contributor

alamb commented Mar 11, 2024

Thanks @Omega359 -- that looks nice. I think the next step is to turn it into a function with some doc strings and an example

@Omega359
Copy link
Contributor

I took a quick look at the impl for sql_to_expr and on down the code paths from there. Removing the requirement for a schema may be problematic since so much of the api requires a schema for handling identifiers, aliases and type information.

@alamb
Copy link
Contributor

alamb commented Mar 11, 2024

I took a quick look at the impl for sql_to_expr and on down the code paths from there. Removing the requirement for a schema may be problematic since so much of the api requires a schema for handling identifiers, aliases and type information.

Maybe we could use an empty schema (e.g. https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html#method.empty) if there were no column references

I agree that if there are column references in the expression, the user will need to provide a schema for parsing

@backkem
Copy link
Contributor

backkem commented Mar 13, 2024

This new testcase has some similar code:
https://github.com/apache/arrow-datafusion/blob/3b61004a02978843ff02a06c8f1f3ce3e88e4a6d/datafusion/sql/tests/sql_integration.rs#L4501-L4509

Maybe it can be cleaned up when the new API is added.

@alamb
Copy link
Contributor

alamb commented Jun 18, 2024

This came up again on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1252616019357470731

Here is some code that @Omega359 shared

pub(crate) fn build_expression(norm: &FieldNormalizer, df: &DataFrame, expression: &str) -> Result<Expr> {
    let parser = Parser::new(&AnsiDialect {});
    let tokenized = parser.try_with_sql(expression);
    if tokenized.is_err() {
        bail!("Unable to tokenize expression '{}': {:?}", &expression, tokenized.err())
    }

    let parsed = tokenized?.parse_expr();
    if parsed.is_err() {
        bail!("Unable to parse expression '{}': {:?}", &expression, parsed.err())
    }

    let expr = parsed?;
    debug!("Parsed expression was '{}'", expr);

    let parser_options = ParserOptions {
        enable_ident_normalization: false,
        parse_float_as_decimal: false,
    };

    let table_schema_provider = TableSchemaProvider::new(norm.table_name.clone(), norm.session_context.clone());
    let sql_to_rel = SqlToRel::new_with_options(&table_schema_provider, parser_options);
    let sql_to_expr = sql_to_rel.sql_to_expr(expr.clone(), df.schema(), &mut PlannerContext::new());

    if sql_to_expr.is_err() {
        bail!("Unable to transform sql expression '{}' to datafusion Expr: {:?}", &expression, sql_to_expr.err())
    }

    Ok(sql_to_expr?)
}

I think it would be great to add an API to DataFrame and SessionContext to do this:

impl SessionContext {
   /// parse the provided expression as a string
  fn parse_sql(&self, sql: &str) -> Result<Expr> { ... }
}
impl DataFrame {
   /// parse the provided expression as a string
  fn parse_sql(&self, sql: &str) -> Result<Expr> { ... }
}

Perhaps @xinlifoobar has time to help here 🤔

@xinlifoobar
Copy link
Contributor

I am trying to look into this.

@xinlifoobar
Copy link
Contributor

xinlifoobar commented Jun 19, 2024

This came up again on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1252616019357470731

Here is some code that @Omega359 shared

pub(crate) fn build_expression(norm: &FieldNormalizer, df: &DataFrame, expression: &str) -> Result<Expr> {
    let parser = Parser::new(&AnsiDialect {});
    let tokenized = parser.try_with_sql(expression);
    if tokenized.is_err() {
        bail!("Unable to tokenize expression '{}': {:?}", &expression, tokenized.err())
    }

    let parsed = tokenized?.parse_expr();
    if parsed.is_err() {
        bail!("Unable to parse expression '{}': {:?}", &expression, parsed.err())
    }

    let expr = parsed?;
    debug!("Parsed expression was '{}'", expr);

    let parser_options = ParserOptions {
        enable_ident_normalization: false,
        parse_float_as_decimal: false,
    };

    let table_schema_provider = TableSchemaProvider::new(norm.table_name.clone(), norm.session_context.clone());
    let sql_to_rel = SqlToRel::new_with_options(&table_schema_provider, parser_options);
    let sql_to_expr = sql_to_rel.sql_to_expr(expr.clone(), df.schema(), &mut PlannerContext::new());

    if sql_to_expr.is_err() {
        bail!("Unable to transform sql expression '{}' to datafusion Expr: {:?}", &expression, sql_to_expr.err())
    }

    Ok(sql_to_expr?)
}

I think it would be great to add an API to DataFrame and SessionContext to do this:

impl SessionContext {
   /// parse the provided expression as a string
  fn parse_sql(&self, sql: &str) -> Result<Expr> { ... }
}
impl DataFrame {
   /// parse the provided expression as a string
  fn parse_sql(&self, sql: &str) -> Result<Expr> { ... }
}

Perhaps @xinlifoobar has time to help here 🤔

I remade this comment as my previous example is not proper here. I am thinking of the usages of DataFrame::parse_sql, e.g.,

let ctx = SessionContext::new();
// Read the data from a csv file
let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;

let expr1 = df.parse_sql("a > 10");
let expr2 = df.parse_sql("a < 10");

This is reasonable where df_schema is store local inside the dataframe.

@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

@xinlifoobar has a very nice PR to add this API: #10995

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants