Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TQL in CTE position #1915

Open
waynexia opened this issue Jul 10, 2023 · 1 comment
Open

Support TQL in CTE position #1915

waynexia opened this issue Jul 10, 2023 · 1 comment
Labels
C-feature Category Features

Comments

@waynexia
Copy link
Member

waynexia commented Jul 10, 2023

What problem does the new feature solve?

Support SQL like

WITH prom_result AS (
    TQL EVAL (0, 100, '10s') sum(rate(http_requests_total[5m])) BY (job)
)
SELECT * FROM prom_result
ORDER BY sum_rate_http_requests_total_5m
LIMIT 10

What does the feature do?

Currently TQL and SQL are still very dissociated. Interoperability between independent components such as CTE is a great starting point.

Implementation challenges

Our parser hacks the keyword TQL at the beginning at the statement level (is dispatched in function parse_statement(). In this task however, the whole statement is a "normal" SQL SELECT. The first challenge is to support parsing TQL inside a SQL SELECT.

The two parts cannot be assembles into one synthetic AST, as there is no place for TQL in SQL AST. We have to process them separately. Precisely, plan the TQL part first and then SQL. TQL itself is a independent query, and can be planned like before.

To combine the logical plan from TQL, and the SQL AST, we can leverage the CTE capability as we wrote in the query example. During planning, all registered CTEs is hold in datafusion struct PlannerContext with type HashMap<String, Arc<LogicalPlan>>. We can register our TQL logical plan into the context as a CTE, and it should be able to be referenced when planning SQL AST. And to prevent the sqlparser processing our TQL part, we should replace it with something like tql_1, and this string will then be used as the hashmap's key to register and retriere TQL plan.

Finally, we should get our plan that contains both SQL and TQL, and the executor can execute it directly.

One more little point, as the output of TQL query always has strange column name (which is composed from the operator's name), we might also want to support column alias in CTE's definition to make it easier to use, like

WITH prom_result (col_a, col_b, col_c) AS (
    TQL EVAL (0, 100, '10s') sum(rate(http_requests_total[5m])) BY (job)
)
@waynexia waynexia added the C-feature Category Features label Jul 10, 2023
@etolbakov
Copy link
Collaborator

@waynexia I would like to give it a go if you don't mind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category Features
Projects
None yet
Development

No branches or pull requests

2 participants