make `Parser` generic around dialect #1381

samuelcolvin · 2024-08-14T15:53:32Z

if we care about performance, we should stop using dynamic dispatch and make the parser generic around the dialect, with that we could make lots of these methods const (or drop the Precedence enum and just have const values on the trait) and probably improve performance significantly in general.

That would of course be a big change to the public API.

This is definitely how would implement Parser if I was starting now, but I think we should see some evidence that parsing SQL is a meaningful chunk of time for anyone before making a change like this.

My guess is that:

even for quick queries, SQL parsing is <1% of query time
making Parser generic and therefore dropping the restriction on Dialect that it has to be 'object safe' would actually only save us ~20%

If both those assumptions are right, this doesn't seem worth it unless it makes the code generally easier to reason with and work on.

Originally posted by @samuelcolvin in #1379 (comment)

(separate issue seems worth it for this discussion)

The text was updated successfully, but these errors were encountered:

jmhain · 2024-08-14T19:54:16Z

if we care about performance, we should stop using dynamic dispatch and make the parser generic around the dialect,

I disagree with this blanket assertion. It's a trade-off, and you can't actually say which way is better out of the context of a specific use case. If you only use sqlparser-rs with one dialect, then sure, your suggestion will surely be a net positive. If you use it for many dialects in a single application (as I do in my use case) then your suggestion will lead to an instantiation of the entire parser per dialect during monomorphization, increasing the binary size, and causing the CPU to have to page the different versions in and out of memory while concurrently handling requests that need to parse for different dialects.

samuelcolvin · 2024-08-15T10:22:25Z

If you use it for many dialects in a single application (as I do in my use case) then your suggestion will lead to an instantiation of the entire parser per dialect during monomorphization, increasing the binary size, and causing the CPU to have to page the different versions in and out of memory while concurrently handling requests that need to parse for different dialects.

Sure it will lead to a larger binary, but I wouldn't be at all sure that the overall performance will be worse in your case. Avoiding the lookup stuff related to dynamic dispatch will still improve performance.

alamb · 2024-08-15T11:44:07Z

I suggest the next step if anyone wants to pursue this would be to create a cargo bench style benchmark that parses a bunch of SQL. Then we can profile it and figure out where time is being spent and evaluate tradeoffs to improve the speed

alamb · 2024-08-15T11:45:42Z

even for quick queries, SQL parsing is <1% of query time

My anecdotal observations confirm this guess. From working on DataFusion for 4 years and looking at many profiling traces (including just query planning) is that I have never seen the SQL parser show up at all as a significant time consumer.

alamb mentioned this issue Aug 15, 2024

Simple custom lexical precedence in PostgreSQL dialect #1379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make `Parser` generic around dialect #1381

make `Parser` generic around dialect #1381

samuelcolvin commented Aug 14, 2024 •

edited

Loading

jmhain commented Aug 14, 2024

samuelcolvin commented Aug 15, 2024

alamb commented Aug 15, 2024

alamb commented Aug 15, 2024

make Parser generic around dialect #1381

make Parser generic around dialect #1381

Comments

samuelcolvin commented Aug 14, 2024 • edited Loading

jmhain commented Aug 14, 2024

samuelcolvin commented Aug 15, 2024

alamb commented Aug 15, 2024

alamb commented Aug 15, 2024

make `Parser` generic around dialect #1381

make `Parser` generic around dialect #1381

samuelcolvin commented Aug 14, 2024 •

edited

Loading