Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make Parser generic around dialect #1381

Open
samuelcolvin opened this issue Aug 14, 2024 · 4 comments
Open

make Parser generic around dialect #1381

samuelcolvin opened this issue Aug 14, 2024 · 4 comments

Comments

@samuelcolvin
Copy link
Contributor

samuelcolvin commented Aug 14, 2024

if we care about performance, we should stop using dynamic dispatch and make the parser generic around the dialect, with that we could make lots of these methods const (or drop the Precedence enum and just have const values on the trait) and probably improve performance significantly in general.

That would of course be a big change to the public API.

This is definitely how would implement Parser if I was starting now, but I think we should see some evidence that parsing SQL is a meaningful chunk of time for anyone before making a change like this.

My guess is that:

  • even for quick queries, SQL parsing is <1% of query time
  • making Parser generic and therefore dropping the restriction on Dialect that it has to be 'object safe' would actually only save us ~20%

If both those assumptions are right, this doesn't seem worth it unless it makes the code generally easier to reason with and work on.

Originally posted by @samuelcolvin in #1379 (comment)

(separate issue seems worth it for this discussion)

@jmhain
Copy link
Contributor

jmhain commented Aug 14, 2024

if we care about performance, we should stop using dynamic dispatch and make the parser generic around the dialect,

I disagree with this blanket assertion. It's a trade-off, and you can't actually say which way is better out of the context of a specific use case. If you only use sqlparser-rs with one dialect, then sure, your suggestion will surely be a net positive. If you use it for many dialects in a single application (as I do in my use case) then your suggestion will lead to an instantiation of the entire parser per dialect during monomorphization, increasing the binary size, and causing the CPU to have to page the different versions in and out of memory while concurrently handling requests that need to parse for different dialects.

@samuelcolvin
Copy link
Contributor Author

If you use it for many dialects in a single application (as I do in my use case) then your suggestion will lead to an instantiation of the entire parser per dialect during monomorphization, increasing the binary size, and causing the CPU to have to page the different versions in and out of memory while concurrently handling requests that need to parse for different dialects.

Sure it will lead to a larger binary, but I wouldn't be at all sure that the overall performance will be worse in your case. Avoiding the lookup stuff related to dynamic dispatch will still improve performance.

@alamb
Copy link
Contributor

alamb commented Aug 15, 2024

I suggest the next step if anyone wants to pursue this would be to create a cargo bench style benchmark that parses a bunch of SQL. Then we can profile it and figure out where time is being spent and evaluate tradeoffs to improve the speed

@alamb
Copy link
Contributor

alamb commented Aug 15, 2024

even for quick queries, SQL parsing is <1% of query time

My anecdotal observations confirm this guess. From working on DataFusion for 4 years and looking at many profiling traces (including just query planning) is that I have never seen the SQL parser show up at all as a significant time consumer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants