Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal SQL runtime #768

Closed
mcrascal opened this issue Apr 19, 2023 · 3 comments
Closed

Universal SQL runtime #768

mcrascal opened this issue Apr 19, 2023 · 3 comments

Comments

@mcrascal
Copy link
Member

This is a big one. We'd love any thoughts, input, or suggestions.

Prototyping work on this is in a fairly early stage, so there's lots of room for input.

This project has four big goals:

  1. Enable a more declarative parameterized page experience, without exploding our user's data warehouse costs
  2. Enable performant interactive features, without the need for client side JS
  3. Support multiple simultaneous data sources in one project. (data from a .csv, a postgres connection, a snowflake connection in the same project)
  4. Align all evidence flavoured markdown on a single portable syntax which can be shared and re-used, with minimal re-work

Here's how this will work at a high level:

  1. Standalone .sql files (Support standalone .sql files  #85) will continue to run against your data warehouse, and be included in a page using markdown front matter
  2. We'll ship an instance of WASM ddb with the static payload of a page. It will include the results of the queries referenced in front matter
  3. Inlined sql queries inside markdown files will be written in ddb syntax against those source queries. Svelte will handle reactivity, and queries will re-execute in your end-user's browser, in the same way that client-side js currently runs.
  4. We'll add additional front-matter syntax to support pre-filtering the contents of the inlined view for parameterized pages.

Example page - user interactive

----
sources:
    - customer_cash_flow_forecast.sql
----

User Input:
<Range from=0 to=0.10 default=0.05 fmt='pct' name='discount_rate' title="Discount Rate Assumption" />

Templated SQL:
```sql

select
sum(annual_cash_flow / ${discount_rate}) as present value
from customer_cash_flow_forecast

```

Example page - declarative paramaterized page

----
sources:
    - customer_metrics.sql
    - where: customer_id === $customer_id 
----

SQL which yields only records that match $customer_id:
```sql

select * from customer_metrics 

```

We may release this as an optional feature, or offer the ability to use both source system SQL and client side SQL inline in pages.

More detail

Evidence projects will have the following directory structure:

- sources 
- views (optional) 
- pages 
- components (optional)
- evidence.config.yaml 

Sources

  • Everything in sources gets run and turned into parquet at build time, and as necessary during dev
  • Flat files that duckdb supports natively (.csv, parquet)
  • .sql files containing queries against N database connections
  • (Eventually) queries against N API connections defined in yaml files
  • (Eventually) N jupyter notebooks (run, extract data frames to parquet)

Views

  • Views are .sql files containing duck db queries. These allow users to write ddb queries against all of their sources, which they can then re-use on multiple pages.

Pages

  • Markdown

Components

  • Svelte

Pages will reference sources through front matter per #85

Page System and Reactive Queries

Page Commons

  • Each page is initialized with a wasm ddb instance. The "page db"
  • The page db is populated with views defined in the frontmatter, which reference the sources parquet files
  • Components on the page pass around and reactively update strings, not json blobs of resolved data. We use svelte for the reactivity.
  • Still to be discussed: components take sql strings as inputs and issue their own queries against the page db to retrieve their own data, reducing the need to write inlined queries to support individual charts

CleanShot 2023-04-19 at 12 21 37@2x

@archiewood
Copy link
Member

archiewood commented Apr 19, 2023

Exciting developments!

Only thought is that if we make it mandatory then adds some complexity to first user experience, requiring sources folder.

There's something nice about the simplicity of the current starting experience, and then abstracting as you add complexity.

@ItsMeBrianD
Copy link
Member

We'll need to identify a way to indicate if an inline query (or a query file, even) is coming from a data source directly or from the in-app runtime. I'm not sure how this would play with the \``sql [queryId]```` syntax that we have now

@voberoi
Copy link

voberoi commented Jun 5, 2023

@archiewood pointed me to this GH issue to leave some thoughts since it's related to #728 (which I am a proponent of).

Speaking to this project's goals:

  • Enable a more declarative parameterized page experience, without exploding our user's data warehouse costs
    • I'm not (currently) affected by the constraint you're trying to address here (exploding data warehouse costs).
    • Even if I was, I'd prefer a focus on familiar ways to do common tasks, like SQL templating, with warnings/advanced features to ensure users don't explode their cloud DW costs.
  • Enable performant interactive features, without the need for client side JS
    • This would be really neat!
  • Support multiple simultaneous data sources in one project. (data from a .csv, a postgres connection, a snowflake connection in the same project)
    • Also neat. I can think of cases where this is super handy.
  • Align all evidence flavoured markdown on a single portable syntax which can be shared and re-used, with minimal re-work
    • I don't understand what this means.

Thoughts on some of the ideas in the proposal:

Declarative Parameterized Page

----
sources:
    - customer_metrics.sql
    - where: customer_id === $customer_id 
----

SQL which yields only records that match $customer_id:
 
select * from customer_metrics

This looks wildly confusing to me.

  • I can't read the SQL and understand what it'll return. I have to read the front-matter too.
  • I have to learn Evidence-specific syntax for filter predicates

What is the motivation behind this syntax?

WASM DuckDB

This looks neat, and like a useful way to handle user interactivity client-side. I'm interested to see how this turns out. That said, it seems complicated and rife with potential issues if you're using a different source DB.

You could consider using https://github.com/tobymao/sqlglot for help. I believe Toby managed to get a WASM build working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants