-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for DuckDB or ClickHouse? #31
Comments
Thanks for asking; If SQLite can be supported in a reasonable way, I would like to support it! I assume that 99% can be ported easily, but there might be a few hard parts; I'm not sure though, because I don't know SQLite well enough to judge without looking into it. Ideally, I would like to avoid a class port (a fork that implements SQLite, while this repo only implements PostgreSQL), because keeping the two synchronised is labor-intensive and error-prone. I would propose that you try porting it in a draft PR and leave comments on all sections that are unclear or unfinished. We can then think about how to implement a switch that In the worst case, if the SQLite code path ends up being too different from the PostgreSQL one, we'll have to resort to a true fork. |
Makes sense! For now I'm going to pursue some other routes (Transitland API, working directly with a SQLite implementation) and will circle back if this seems like the quickest path :) |
duckdb could also be an interesting option and it looks like there's node bindings. Maybe an ORM or something like dbt to help with the cross-database compatibility? |
Indeed, DuckDB claims to cover the GTFS analysis use case very well. One could analyze how it performs with the queries used as benchmarks, which reflect (my personal, happy to receive some feedback) use cases.
AFAIK usually ORMs assume that the process connects to the DB and sends queries "interactively", which is unnecessarily slow for our use case (inserting millions of rows known up front). Is it possible to generate SQL (dialects) using dbt? |
I think the main use case would be to generate different dialects for the views/tables on top of the raw data, for example using SQLAlchemy to make stuff like
Yes it's essentially just a framework for templating SQL via Jinja (and providing a dependency graph it can execute) and developers use macros to provide cross-database support. dbt will compile to the appropriate dialect depending on your chosen database target; I could imagine either SQLAlchemy or dbt being useful for creating the views/tables on top of the initial tables. I think you've looked at our project some already but we use dbt as the "source of truth" for our BigQuery data warehouse models. |
I'm hesitant to use an ORM, even if it "just" generates SQL dialects instead of connecting to the DBs, for the following reasons:
While these are strengths of |
Didn't have a look at the DB-related code yet. Will have a look. |
FYI: I'm currently looking into how well DuckDB fits as a local in-process database. also related: http://duckdb.org/2023/08/23/even-friendlier-sql.html |
I'd also recommend to have a look at https://clickhouse.com/ - I'm running a 80M x 78 database (rows x columns) on another topic and the query time went down from 1 seconds to less than 0.01 ms - even without indexes. |
I think adopting any other database as a second option is only worth it if it provides significant operational (e.g. SQLite/DuckDB with local in-process execution and great portability) or performance (e.g. parallel execution beyond what PostgreSQL can do) benefits. I briefly looked into both DuckDB as well as ClickHouse, and both look promising with regards to But I think I won't have enough time/motivation to investigate the necessary changes to support any one. You're very welcome to do this and submit a WIP Pull Request! Please note though that I won't necessarily merge it, depending on the increase in code complexity it brings, so consider this an experiment. This is also why, if you decide to investigate one of the DB systems, I encourage you to give frequent update on "roadblocks" (where you notice that significant changes in the SQL generated |
I think the
arrivals_departures
view is what I'm most interested in, but for the simple iOS app I'm building I'd like to avoid hosting a database. Would it be possible/reasonable to provide a SQLite port? FWIW I'm happy to work on this, I just haven't done such a port before.The text was updated successfully, but these errors were encountered: