Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added query optimization to GraphSailConnection #504

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Added query optimization to GraphSailConnection #504

wants to merge 3 commits into from

Conversation

niclashoyer
Copy link
Contributor

I ran into some performance problems when using the Sesame ouplementation and figured out that there was no query optimization.
I added basic query optimization that should speed up query execution in most cases.

@spmallette
Copy link
Member

@joshsh can you please take a look at this one when you get a chance?

@joshsh
Copy link
Member

joshsh commented Sep 16, 2014

@niclashoyer I'm happy to merge this if you can demonstrate a performance boost for your queries. I gave some similar optimization a try, a while back, but removed it, as it seemed to have little impact: 3f2b225

@niclashoyer
Copy link
Contributor Author

I'm developing a SPARQL-plugin for Neo4j that'll be released in the next months. At the moment I'm benchmarking performance of the implementation against other graph stores (such as Fuseki).

I'm using the Berlin SPARQL Benchmark (BSBM) to perform these tests. Basically it generates a dataset of a desired size and then it runs different queries against the graph store and records the performance.

I already had a discussion on the Sesame mailing list about some queries that perform bad, because reordering them manually improved the performance significantly.

To verify that the query optimization improves performance I set up a small dataset (8498 triples) and ran the tests with and without query optimization. Without optimization the implementation runs 2365.05 query mixes per hour (each query mix consisting of several queries). With the optimization it runs 13068.70 query mixes per hour. For detailed logs see this gist.

The difference gets worse if the dataset grows.

@niclashoyer
Copy link
Contributor Author

@joshsh is that enough information for the query optimization? I could probably do some more tests, but that would take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants