Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery REGEXP_EXTRACT adds a non-supported fifth argument to the call #1776

Closed
2 tasks done
xiaoling opened this issue Jun 14, 2023 · 4 comments
Closed
2 tasks done
Assignees

Comments

@xiaoling
Copy link

xiaoling commented Jun 14, 2023

Before you file an issue

  • Make sure you specify the "read" dialect eg. parse_one(sql, read="spark")
  • Check if the issue still exists on main

Fully reproducible code snippet
Please include a fully reproducible code snippet or the input sql, dialect, and expected output.

Ran from main as of reporting

>>> import sqlglot
>>> sqlglot.parse_one("""SELECT
...   REGEXP_EXTRACT(`foo`, 'bar: (.+?)', 1, 1) AS `bar`
... FROM `DATA`""", read="bigquery").sql(dialect="bigquery")
'SELECT REGEXP_EXTRACT("foo", \'bar: (.+?)\', 1, 1, 1) AS "bar" FROM "DATA"'

The output adds a non-supported fifth argument to the function call of REGEXP_EXTRACT
We expect 'SELECT REGEXP_EXTRACT("foo", \'bar: (.+?)\', 1, 1) AS "bar" FROM "DATA"' as the output

Official Documentation
Please include links to official SQL documentation related to your issue.
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract
REGEXP_EXTRACT only supports at most 4 args.

Thanks in advance!

@tobymao tobymao self-assigned this Jun 14, 2023
@xiaoling
Copy link
Author

thank you @tobymao ❤️

@xiaoling
Copy link
Author

@tobymao

It looks like the same issue remains at the latest release which has 0a1362b

In [4]: sqlglot.__version__
Out[4]: '16.2.1'

In [5]: import sqlglot
   ...: sqlglot.parse_one("""SELECT
   ...:   REGEXP_EXTRACT(`foo`, 'bar: (.+?)', 1, 1) AS `bar`
   ...: FROM `DATA`""", read="bigquery").sql()
Out[5]: 'SELECT REGEXP_EXTRACT("foo", \'bar: (.+?)\', 1, 1, 1) AS "bar" FROM "DATA"'

The output shouldn't contain the 5th argument

@georgesittas
Copy link
Collaborator

georgesittas commented Jun 16, 2023

@xiaoling you need to provide the dialect argument for the sql method:

>>> import sqlglot
>>> sqlglot.parse_one("""SELECT
...   REGEXP_EXTRACT(`foo`, 'bar: (.+?)', 1, 1) AS `bar`
...   FROM `DATA`""", read="bigquery").sql(dialect="bigquery")
"SELECT REGEXP_EXTRACT(`foo`, 'bar: (.+?)', 1, 1) AS `bar` FROM `DATA`"

@xiaoling
Copy link
Author

Thank you @georgesittas was thinking the parse might carry the dialect. this works now

adrianisk pushed a commit to adrianisk/sqlglot that referenced this issue Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants