-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example using SQLite via SQLx for secondary indexes #1
Conversation
|
||
SQLite is used as a stand-in for an external remote relational database, it should be easy to adapt this example to use another database. | ||
|
||
This examples should be considered incomplete: it does not try to handle **many** edge cases or push down filters as much as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, I am quite concerned that the filter pushdown might be incorrect in some ways, I don't intend for people to copy this or use it directly. Happy to add some tests and increase confidence then tone down this warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll point out any obvious errors I see
sqlx-sqlite/src/index.rs
Outdated
FileStatistics::RowGroupCount, | ||
]) | ||
.column(ColumnStatistics::RowGroup) | ||
.distinct() // could be distinct_on(vec![ColumnStatistics::FileId, ColumnStatistics::RowGroup]) if the backing store supports it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a couple thing along these lines that are just limitations of SQLite
cc @alamb |
sqlx-sqlite/index.db
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an empty sqlite file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb -- this looks very cool. Thank you for sharing
|
||
SQLite is used as a stand-in for an external remote relational database, it should be easy to adapt this example to use another database. | ||
|
||
This examples should be considered incomplete: it does not try to handle **many** edge cases or push down filters as much as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll point out any obvious errors I see
sqlx-sqlite/src/index.rs
Outdated
|
||
/// Push down a simple binary expression to the index | ||
/// Only a subset of expressions are supported since `a = 1` has to be rewritten as `a_int_max_value >= 1 AND a_int_min_value <= 1` | ||
fn push_down_binary_filter(value: &ScalarValue, op: &Operator) -> Option<SimpleExpr> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to call this "rewrite_filter" or "rewrite_binary_expr" as it isn't really "pushing down" the filter in my mind -- it is creating SimpelExpr
that wll be "pushed down" (into sqlite)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this rewrite looks like a combination
- The expr rewrite that PruningPredicate does internally
- A translation from the DataFusion
Expr
s to the sea_query
If you separated the two passes the logic might be clearer and maybe you could reuse some existing code:
Some thoughts:
- Consider using the rewritten PruningPredicate expr directly (and there has been a lot of effort in testing that rewrite for correctness)
- Use the
Expr
-->SQL
code inexpr_to_sql
instead of sea_query (as the code already exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, I was wondering if there's existing tested code to do these conversions. I see that I can build a PruningPredicate
and call PruningPredicate::predicate_expr
to get the rewritten expr so I can then run that against SQLite and not have to materialize the statistics in memory (which is what PruningPredicate
seems to be used for generally).
However the way that rewriting happens it expects the schema of the statistics to match the schema of the table: value
needs columns value_min
and value_max
of the correct type.
That makes sense when your statistics are stored with the actual data or in memory but breaks down if you use a traditional RDMS and want to support more than 1 table schema because you don't want to make a super wide table and add dynamically generated columns.
The schema I'm using here gets around that by storing a sparse table (1 column for each of max/min of each type) of values with (file_id, row_group, column_name)
as the primary key.
What do you think generally of the statistics schema being used here, any idea how we can generalize what PruningPredicate
does without requiring the schema of the statistics table to depend on the schema of the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking a bit further into what PruningPredicate
is doing it seems to me like the LiteralGuarantees
part is not necessary if I'm pushing work to another database. Let me know if that sounds right or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to keep the current schema and re-use existing logic I think we'd need to make RequiredColumns::stat_column_expr pluggable.
I'll also note that I use the type of the value to determine which column to query (since they're named as int_max_value
and such), which seems wrong as soon as someone does date = '2020-01-01'
. I think it would be better to use the approach that I think the existing code is using of getting the type from the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you mean col
can never be x
and y
at the same time 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see -- yes, if you want to push the evaluation down into the lower level database without a column per statistic (which could indeed result in a large number of columns) it would be hard to reuse the exisitng pruning predicate logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may still be able to make it work by doing a join per column. Which I don't love, but should work and avoids any dynamic schemas in the index database. I'll give that a try and loop back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to rework this to be simple and hardcoded with the index's schema having a min/max column per column in the actual schema. It was getting out of hand to try to make this fancier for an example.
@alamb is there a way to go from a PhysicalExpr
to an Expr
if I want to use expr_to_sql
? It's another thing where it would be simpler for the example but not universal (I assume it generates datafusion SQL, there's no guarantee that's valid for the lower level index database).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I've reworked this to use PruningPredicate
to do the rewriting. The statistics table now has largely the same schema as what PruningPredicate
expects.
I did not use expr_to_sql
because that generates DataFusion flavored SQL which would not be compatible with many index databases. Instead I opted to implement a function to convert from PhysicalExpr
-> sea_query::SimpleExpr
. I think this is valuable to have as an example for folks, it covers a wide range of index databases (since SeaQuery can generate SQL for SQLite, Postgres, MySQL, or be extended) and is not that much code (at least for the cases I chose to handle).
sqlx-sqlite/src/index.rs
Outdated
// TODO: we could aggregate the row groups into an array in the query to transmit less data over the wire | ||
// (and maybe avoid the join), leaving that as a TODO since it introduces more complexity and coupling to the index's backing store | ||
// Result is in the form of (file_name, file_size, row_group_count, row_group_to_scan) | ||
let row_groups: Vec<(String, i64, i64, i64)> = sqlx::query_as_with(&sql, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW you could probably use sqlite to do the aggregations
Like add a SELECT DISTINCT ...
to the query so you wouldn't have to handle duplicates in the Rust code (handling duplicates is fine, I am just pointing it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a distinct
in there (currently line 91).
I do think these queries could be a bit fancier, e.g. I'd like it to just return file_name string, file_size int, row_groups int[]
but I'm not sure that's possible with sqlite, I'm most familiar with postgres. But I do plan on giving it another shot before merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool -- sorry I didn't see that. Nice@
sqlx-sqlite/index.db
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly don't want to commit the DB to the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually do. Running this is idempotent and it's one setup command less. The empty db I'd also 24kB
Looks great in general, I'll read this more later. I do wonder if we could do something similar using a bloom filter stored in the external index - either using the A bloom filter will only make sense on some columns, and even then on relatively large row groups or entire files, but still it could be very effective for stuff like |
Yes, absolutely! I think that's beyond the scope of this example though. |
yes definitely. |
BTW bloom filters (and similar structures) are the usecase for Specifically that will tell you the constant values to check for in your bloom filter. We use this code in the parquet reader to evaluate parquet bloom filters |
It'd be really cool to have an example (again, I think not this one) where |
sqlx-sqlite/src/index.rs
Outdated
let null_counts = null_counts.as_primitive::<UInt64Type>(); | ||
|
||
for row_group in 0..metadata.num_row_groups() { | ||
match field.data_type() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably get some re-use here with two generic functions (one for IntXType
and one for strings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I nerd-sniped myself into trying this; davidhewitt@9fa9e07 - feel free to pull the commit if you want it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up going with e88641b which also means we do the downcasts once instead of for each value in the stats arrays. Let me know if that sounds good to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah definitely better!
let values = match array.data_type() { | ||
DataType::Int8 => { | ||
let array = array.as_primitive::<datatypes::Int8Type>(); | ||
array.iter().map(|v| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the map closure here be simplified to:
array.iter().map(Value::TinyInt).collect()
and similar for the other data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least those that don’t require a box
I think this is in a good enough state to merge and be iterated upon in the future. |
No description provided.