Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when embedding Presto inside of Python strings #278

Open
tomspeak opened this issue Oct 11, 2024 · 4 comments
Open

Errors when embedding Presto inside of Python strings #278

tomspeak opened this issue Oct 11, 2024 · 4 comments

Comments

@tomspeak
Copy link

Almost all of the SQL I write is inside of python data pipelines, using Spark and Presto flavoured syntax.

So my queries are written like

SqlQuery(
  select="""
    WITH users_in_threads AS (
          SELECT
              id,
          FROM {THREAD_TABLE}
          CROSS JOIN UNNEST(userid_array) AS t (userid)
      )
    SELECT
        userid,
        MAP_AGG(id, name) AS names,
        COUNT() AS cnt
    FROM {USERS_TABLE}
    GROUP BY
        1
  """
)

When using tree-sitter-sql via neovim, I extended it to also capture the content inside of these strings via an injection.

; extends

(string 
  (string_content) @injection.content
    (#vim-match? @injection.content "^\w*SELECT|FROM|INNER|JOIN|UNION|WHERE|CREATE|DROP|INSERT|UPDATE|ALTER|WITH.*$")
    (#set! injection.language "sql"))

This "works", using :InspectTree I can see it correctly captures SELECT/WITH statement, but once it gets to the python string interpolation FROM {THREAD_TABLE}, the parser errors.

This is quite a crazy use-case, but wondered if there is anything that can be done to make the parser looser in this context so it at least does not error out?

@DerekStride
Copy link
Owner

I do a similar thing but with Ruby e.g.

sql = <<~SQL
  SELECT *
  FROM table
  WHERE id = %{id}
SQL

unfortunately there isn't a good way to make the parser recover gracefully since it's not actually valid SQL syntax. Tree-sitter should recover most of the time and not make things look completely broken but I have the occasional query where that's not the case.

@tomspeak
Copy link
Author

Makes sense!

Due to the compilation steps of tree-sitter-sql, I take it there's no easy way for me to locally extend the definitions? As this is the only way I ever interact with SQL, I am OK (in my local version) introducing the concept of {PYTHON_VARIABLE} to the SQL syntax.

@DerekStride
Copy link
Owner

I think you'd have to fork the repo to extend the parser in that way. It's like not too hard, you could just add the curly brace ({}) characters to the identifier node and that might be enough.

_identifier: _ => /[a-zA-Z_][0-9a-zA-Z_]*/,

@matthias-Q
Copy link
Collaborator

matthias-Q commented Oct 15, 2024

I was wondering if it would not be possible to have recursive injections. I mean nvim-treesitter knows that the embedded code is SQL, so why not apply the query for injections there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants