Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Explore re-using XPathContext objects, and compiling XPath expressions #3266

Open
flavorjones opened this issue Jul 2, 2024 · 0 comments
Milestone

Comments

@flavorjones
Copy link
Member

Two things I want to explore doing to try to improve the performance of XPath (and, transitively, CSS) searches:

  1. re-use XPathContext objects which are a little expensive to create
  2. expose libxml2's ability to compile XPath expressions

For (1), we need to be a bit careful:

  • XPathContext is not thread-safe
  • there is some state we need to set or un-set appropriately:
    • namespaces (via XPathContext#register_namespaces)
    • variables (via XPathContext#register_variable)
  • while preserving other state:
    • the nokogiri: prefix used for dynamic function binding
    • the nokogiri-builtin: prefix used for our performance-optimized builtin functions
    • the built-in xpath functions themselves

but the performance improvement could be significant, see this response from the current libxml2 maintainer indicating "best practice" is to keep one XPathContext per thread and re-use it.

The benchmark submitted by a user in #760 indicates a 4x(!) speedup on simple expressions by avoiding re-initializing an XPathContext object. It seems likely that the real-world speedup will be less (since cleaning up registered namespaces and variables will have some overhead), but it still seems like it would be a pretty decent speedup.

For (2), we'll need a new Ruby class to wrap the compiled expression represented by xmlXPathCompExprPtr, and a way to pass that into #xpath, but that seems like relatively straightforward work. (Note this API won't be available in JRuby.)

I'd like to get a rough benchmark ahead of time to see how much time this will save us, for simple and for complex expressions -- after a brief search I couldn't find any prior results here.

@flavorjones flavorjones added this to the v1.17.0 milestone Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant