Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce Property paths forms {x,y} syntax dropped in draft of sparql 1.1 https://www.w3.org/TR/2012/WD-sparql11-query-20120724 #101

Open
JervenBolleman opened this issue Sep 10, 2019 · 12 comments
Labels
paths Paths in graph patterns, including property paths

Comments

@JervenBolleman
Copy link
Collaborator

JervenBolleman commented Sep 10, 2019

Why?

We see in our work quite a demand for finding nodes within x steps of another node in a path.

Previous work

This was sugessted in sparql1.1 working group but was removed in the W3C Working Draft 24 July 2012 in response to the paper Counting beyond a yottabyte, or how SPARQL 1.1. property paths will prevent adoption of the Standard

Proposed solution

Reintroduce the syntax, and solve the solve-ability issues in queries with path queries.
Problem is that to make path queries tractable they where changed to connectivity queries instead of the earlier counting ones. I believe that if the property path syntax is merely a syntactic convenience the SPARQL 1.1. complexity is not increased.

SELECT ?x ?y
WHERE {
   ?x rdfs:seeAlso{1,2} ?y 
}

would be equivalent to

SELECT ?x ?y
WHERE {
  { ?x rdfs:seeAlso ?y  }
  UNION 
  { ?x rdfs:seeAlso/rdfs:seeAlso  ?y}
}

Of course it could be a non counting solution either

SELECT ?x ?y
WHERE {
  {  { SELECT DISTINCT ?x ?y WHERE {?x rdfs:seeAlso ?y  }}
  UNION 
  { SELECT DISTINCT ?x ?y WHERE {?x rdfs:seeAlso ?_Z_ . ?_Z_ rdfs:seeAlso  ?y}}
}

Considerations for backward compatibility

None, for standard SPARQL 1.1. However different implementations have not removed their implementation of the draft from their engines. Introducing a standard might lead to subtle bugs to users of the old format.

@rvesse
Copy link

rvesse commented Sep 11, 2019

Interestingly we actually implemented something like this in one of our query engines. We never bothered to do proper property path support rather we just optionally emulated queries using property paths with their expanded versions similar to the above applying a max limit on path length for unbounded operators (e.g. + and *)

The main problem from an implementors perspective is the potential complexity introduced by this. Yes expanding the query may seem like it makes the implementation simpler the reality is that you end up with very large UNIONs with a lot of repeated work so unless your optimiser/engine can actively reuse the results from one branch of the UNION to prime the other branches you end up duplicating a lot of effort.

There is also a potential scoping problem introduced when the property path occurs as part of a larger graph pattern. Injecting the appropriate path expansion as a child graph pattern effectively breaks up the pattern potentially introducing scoping issues with other elements in the same graph pattern.

@afs afs added the paths Paths in graph patterns, including property paths label Sep 11, 2019
@afs
Copy link
Collaborator

afs commented Sep 11, 2019

The first decision is counting vs non-counting. Originally, they were counting. If put in, my preference is for the non-counting (distinct) form and add an operator into the path evaluation to handle them.

There are also the forms {2,} and {,2} and to get all the forms to fit together, non-counting looks better.

There are other equivances (original counting form):

?x rdfs:seeAlso{1,2} ?y == ?x rdfs:seeAlso/(rdfs:seeAlso?) ?y

The UNION expansion, if done purely as syntax, introduces changes of BGPs which matter for blank node label rules.

@TallTed
Copy link
Member

TallTed commented Sep 11, 2019

fwiw, Support for this syntax was never removed from Virtuoso.

@klinovp
Copy link

klinovp commented Sep 18, 2019

We at Stardog do not support the {x,y} syntax but are interested in supporting it. Our preference would also be the non-counting semantics.

@kasei
Copy link
Collaborator

kasei commented Sep 18, 2019

Responding to @JervenBolleman's email to the list:

For those who where around in the sparql 1.1 standardization days I have a special request. Can you remember why the path{1,200} syntax was dropped

My recollection is that under time pressure we couldn't reach consensus on whether the semantics should be counting (which works nicely as a shorthand for repeated SequencePaths and equivalent to an expanded BGP) or non-counting (which avoids performance problems with large values of n,m, and naturally supports the open-ended version {n,}).

@abrokenjester
Copy link
Collaborator

+1 on recalling that we couldn't reach consensus on the precise semantics. In RDF4J (or Sesame, at the time), this was first implemented, and then removed for the 2.7.4 release. See https://openrdf.atlassian.net/browse/SES-1706. I believe we only removed the code from the actual parser, but the logic in the algebra would still support it - so if we re-added this construct to the parser it would support it again. The implementation had its bugs but for most "regular" cases it worked well enough. From the top of my head our implementation was a counting (non-distinct) one.

I also remember writing a blog post about our implementation of it. That blog is no longer online but I'll see if I can dig it up from my local archive somewhere.

@yayamamo
Copy link

This is a bit late reply, but I'd like to propose a function of counting how many times a predicate is passed from a subject to an object. It may be expressed as follows:

SELECT ?x ?y ?c WHERE {
  ?x rdfs:seeAlso{?c} ?y .
}

The reason why is that we often want to know the distance between a pair of nodes, for example, two concepts in an ontology where they are connected by rdfs:subClassOf.

I think this can be an extension to the forms of this proposal {x,y} syntax.

@namedgraph
Copy link

How about the paper Counting Beyond a Yottabyte, or how SPARQL 1.1 Property Paths will Prevent Adoption of the Standard?

Our results pose a strong argument against the current semantics of property paths, from both, theory and practice. We have made clear that the main problem is the necessity of counting paths imposed by the current SPARQL 1.1 specification. Our investigation raises several questions, being one of the most important whether there exists such a strong use case for counting paths that will make the designers of the language to stick with the current semantics, even knowing that in simple and natural cases it will lead to completely impractical evaluation procedures. We have searched in the official document and also in the discussions around the design of the language, and to the best of our knowledge, there is no strong use case for counting paths. It should also be noticed that this counting functionality has not been used as a primitive in previous navigational languages for graph structured data.

Are these conclusions not relevant anymore? Has something changed since 2012?

@klinovp
Copy link

klinovp commented Sep 17, 2020

It is relevant for the counting semantics. But not for the path existence semantics, i.e. the same that's used for property paths in the current version of the spec. If the latter is adopted, then ?s :p{2} ?e won't be equivalent to ?s :p/:p ?e but the performance problems described if the Yottabyte paper won't apply.

@yayamamo
Copy link

As for the use case of counting paths, there is it when we want to sort the order of results based on the subClassOf hierarchy depth while it is assumed that the graph structure is tree.

@ericprud
Copy link
Member

So far, I've figured that if you want counting semantics, you can't use property paths, which is OK 'cause you can easily expand them into triple patterns with fresh variables. Property paths with {m,n} would make such a translation hard enough that humans would have to screw around for a long time with subqueries and would probably get it right 1% of the time.

This is not to argue against {m,n} but instead introduce it with some keyword like ALL to switch to a counting mode. Maybe the algorithms in https://aic.ai.wu.ac.at/~polleres/publications/save-etal-2017SEMANTICS.pdf would be useful.

@gkellogg
Copy link
Member

I've implemented this in Ruby SPARQL, essentially along the re-write direction suggested by @afs in #101 (comment). The one case that isn't handled natively is path{0}, which requires a new operator.

For those implemented, I created some tests, which can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
paths Paths in graph patterns, including property paths
Projects
None yet
Development

No branches or pull requests