Universal SQL runtime #768

mcrascal · 2023-04-19T16:22:57Z

This is a big one. We'd love any thoughts, input, or suggestions.

Prototyping work on this is in a fairly early stage, so there's lots of room for input.

This project has four big goals:

Enable a more declarative parameterized page experience, without exploding our user's data warehouse costs
Enable performant interactive features, without the need for client side JS
Support multiple simultaneous data sources in one project. (data from a .csv, a postgres connection, a snowflake connection in the same project)
Align all evidence flavoured markdown on a single portable syntax which can be shared and re-used, with minimal re-work

Here's how this will work at a high level:

Standalone .sql files (Support standalone .sql files #85) will continue to run against your data warehouse, and be included in a page using markdown front matter
We'll ship an instance of WASM ddb with the static payload of a page. It will include the results of the queries referenced in front matter
Inlined sql queries inside markdown files will be written in ddb syntax against those source queries. Svelte will handle reactivity, and queries will re-execute in your end-user's browser, in the same way that client-side js currently runs.
We'll add additional front-matter syntax to support pre-filtering the contents of the inlined view for parameterized pages.

Example page - user interactive

----
sources:
    - customer_cash_flow_forecast.sql
----

User Input:
<Range from=0 to=0.10 default=0.05 fmt='pct' name='discount_rate' title="Discount Rate Assumption" />

Templated SQL:
```sql

select
sum(annual_cash_flow / ${discount_rate}) as present value
from customer_cash_flow_forecast

```

Example page - declarative paramaterized page

----
sources:
    - customer_metrics.sql
    - where: customer_id === $customer_id 
----

SQL which yields only records that match $customer_id:
```sql

select * from customer_metrics 

```

We may release this as an optional feature, or offer the ability to use both source system SQL and client side SQL inline in pages.

More detail

Evidence projects will have the following directory structure:

- sources 
- views (optional) 
- pages 
- components (optional)
- evidence.config.yaml

Sources

Everything in sources gets run and turned into parquet at build time, and as necessary during dev
Flat files that duckdb supports natively (.csv, parquet)
.sql files containing queries against N database connections
(Eventually) queries against N API connections defined in yaml files
(Eventually) N jupyter notebooks (run, extract data frames to parquet)

Views

Views are .sql files containing duck db queries. These allow users to write ddb queries against all of their sources, which they can then re-use on multiple pages.

Pages

Markdown

Components

Svelte

Pages will reference sources through front matter per #85

Page System and Reactive Queries

Page Commons

Each page is initialized with a wasm ddb instance. The "page db"
The page db is populated with views defined in the frontmatter, which reference the sources parquet files
Components on the page pass around and reactively update strings, not json blobs of resolved data. We use svelte for the reactivity.
Still to be discussed: components take sql strings as inputs and issue their own queries against the page db to retrieve their own data, reducing the need to write inlined queries to support individual charts

The text was updated successfully, but these errors were encountered:

archiewood · 2023-04-19T16:47:58Z

Exciting developments!

Only thought is that if we make it mandatory then adds some complexity to first user experience, requiring sources folder.

There's something nice about the simplicity of the current starting experience, and then abstracting as you add complexity.

ItsMeBrianD · 2023-04-19T16:52:45Z

We'll need to identify a way to indicate if an inline query (or a query file, even) is coming from a data source directly or from the in-app runtime. I'm not sure how this would play with the \``sql [queryId]```` syntax that we have now

voberoi · 2023-06-05T02:11:30Z

@archiewood pointed me to this GH issue to leave some thoughts since it's related to #728 (which I am a proponent of).

Speaking to this project's goals:

Enable a more declarative parameterized page experience, without exploding our user's data warehouse costs
- I'm not (currently) affected by the constraint you're trying to address here (exploding data warehouse costs).
- Even if I was, I'd prefer a focus on familiar ways to do common tasks, like SQL templating, with warnings/advanced features to ensure users don't explode their cloud DW costs.
Enable performant interactive features, without the need for client side JS
- This would be really neat!
Support multiple simultaneous data sources in one project. (data from a .csv, a postgres connection, a snowflake connection in the same project)
- Also neat. I can think of cases where this is super handy.
Align all evidence flavoured markdown on a single portable syntax which can be shared and re-used, with minimal re-work
- I don't understand what this means.

Thoughts on some of the ideas in the proposal:

Declarative Parameterized Page

----
sources:
    - customer_metrics.sql
    - where: customer_id === $customer_id 
----

SQL which yields only records that match $customer_id:
 
select * from customer_metrics

This looks wildly confusing to me.

I can't read the SQL and understand what it'll return. I have to read the front-matter too.
I have to learn Evidence-specific syntax for filter predicates

What is the motivation behind this syntax?

WASM DuckDB

This looks neat, and like a useful way to handle user interactivity client-side. I'm interested to see how this turns out. That said, it seems complicated and rife with potential issues if you're using a different source DB.

You could consider using https://github.com/tobymao/sqlglot for help. I believe Toby managed to get a WASM build working.

mcrascal mentioned this issue Jun 1, 2023

DDB SQL Console Component #920

Closed

archiewood mentioned this issue Jun 2, 2023

Add support for parameterized queries #728

Closed

mcrascal mentioned this issue Jun 26, 2023

Universal SQL Prototype Stage 1 #973

Closed

8 tasks

mcrascal mentioned this issue Jul 10, 2023

On demand refresh for near-realtime use cases #934

Closed

archiewood mentioned this issue Jul 14, 2023

Align BigValue and Value props #1035

Closed

mcrascal mentioned this issue Aug 22, 2023

Why only one datasource? #1123

Closed

4 tasks

archiewood mentioned this issue Oct 7, 2023

Filter Component(s) #848

Closed

mcrascal closed this as completed Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Universal SQL runtime #768

Universal SQL runtime #768

mcrascal commented Apr 19, 2023

archiewood commented Apr 19, 2023 •

edited

Loading

ItsMeBrianD commented Apr 19, 2023

voberoi commented Jun 5, 2023 •

edited

Loading

Universal SQL runtime #768

Universal SQL runtime #768

Comments

mcrascal commented Apr 19, 2023

archiewood commented Apr 19, 2023 • edited Loading

ItsMeBrianD commented Apr 19, 2023

voberoi commented Jun 5, 2023 • edited Loading

archiewood commented Apr 19, 2023 •

edited

Loading

voberoi commented Jun 5, 2023 •

edited

Loading