-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-4920 SPARQLConnection.size() now uses count query #4972
base: main
Are you sure you want to change the base?
GH-4920 SPARQLConnection.size() now uses count query #4972
Conversation
…nt in the repository. Just send a count query instead. Signed-off-by: Jerven Bolleman <[email protected]>
@hmottestad did you see this license failure before? |
|
||
String sizeAsTupleQuery(Resource... contexts) { | ||
String query = COUNT_EVERYTHING; | ||
if (contexts != null && isQuadMode()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think contexts should never be null, but contexts[0] can probably be null if the user is asking for the default context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RDF4J has the rdf4j:nil context and also a sesame one for addressing the default context. I believe it's a general issue with SPARQL that there is no way to address just the triples in the unnamed context. Jena also has their own: https://jena.apache.org/documentation/tdb/datasets.html#special-graph-names
Maybe we can test if either one of these returns results, and if not fallback to the old method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a problem only when the default graph in SPARQL is the union of all graphs, but this is what I've found most sensible for all applications that I've used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check that a null context[0] will work as well as the rdf4j:nil or semame:nil. For the cases where the remote uses a non union default graph the old case would also have returned 0.
+ "> { ?s ?p ?o}}"; | ||
} else if (contexts.length > 0) { | ||
String graphs = Arrays.stream(contexts) | ||
.filter(Resource::isIRI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with blank nodes in our current implementation? Are blank node identifiers at all stable in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I SPARQL named graphs are not allowed to be blank nodes. So it could have worked if the stars aligned, but never reliably.
@@ -100,6 +114,26 @@ public void testAddSingleContextHandling() throws Exception { | |||
assertThat(sparqlUpdate).containsPattern(expectedAddPattern).containsPattern(expectedRemovePattern); | |||
} | |||
|
|||
@Test | |||
public void testSizeQuery() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any end to end tests that actually spin up a SPARQL endpoint and runs operations against that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not that I am aware off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in FedX federation part we have spin up a SPARQL embedded server for integration testing. Maybe you can borrow some ideas from there
I haven't seen that one before, but I have once experienced an old dependency suddenly failing the license check. I didn't figure out what was wrong last time, ended up just upgrading the dependency. Let me try to submit it to clearlydefined. Might take some time though :( |
CQ: https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/14675 |
*/ | ||
private static boolean isExposableGraphIri(Resource resource) { | ||
// We use the instanceof test to avoid any issue with a null pointer. | ||
return resource instanceof IRI && RDF4J.NIL != resource && SESAME.NIL != resource; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to use .equals(...), there is no guarantee that users will actually use our constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
I'm still a bit unsure about this. We want to be able to correctly count the statements in the default/unnamed graph. I don't know how well the previous solution handled it, but I would assume that it would be able to check for if the context is null on each statement. |
…mote default graph, or a dataset clearer
The behavior is the same between this code and previous regarding the default graph. select * where {?s ?p ?o} New select (count(*) as ?c) where {?s ?p ?o} The logic is a bit more robust in the case of using the internal default graph IRI's of RDF4j. Which should not be send over the wire. |
GitHub issue resolved #4920
Briefly describe the changes proposed in this PR:
SPARQLConnection.size() method should not fetch every statement in the repository. Just send a count query instead.
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)