fix sids for pids #563

dpetran · 2023-08-24T14:16:40Z

This is a general fix for sids used as pids.

Fixes #521

Problem
The problem here was that if a subject is transacted, and then in subsequent transactions that subject iri is used as a predicate iri, we don't recognize that that iri is a vocab iri - it doesn't have a subject id in the "pid range".

It's only a problem insofar as we end up displaying raw sids as key values in the result set of queries - the queries still work and the data is correct, it's just the way we display it that's the problem.

Approaches
So, I think there are two main approaches to solving this problem.

Extend our :schema cache during a transaction
We can trivially tell if we're going to run into this problem at flake creation time by checking if the pid is greater than the max-schema-sid number. We could then add special handling to the vocab-refresh pipeline to deal with this scenario.
Look up iris during projection
Just check the display of keys in the response set and look up the iris of any integer key we see coming through.

This PR takes approach 2, as that's the exact way VariableSelectors already work, and we already have an iri cache for each query.

Complications
While implementing I discovered that different query selectors were adding different structures to the iri cache. They all used pids as keys, but they had different structures for values (plain iri or predicate map or partial predicate map) which meant that a pid that was looked up in the VariableSelector could not be used in the SubgraphSelector, and vice-versa.

I fixed that by normalizing the cache structure to be a map of pid -> pred-spec, a map with some metadata about the predicate, including at least an :as key with the compacted iri as a value.

Future considerations
One additional thing I discovered after adding some logging to the cache misses is that the volatile we use as a cache is not thread safe, so there are cases where the same iri will be looked up multiple times in the course of projecting one result set. It may be useful in the future to benchmark and see whether using an atom for the cache would speed things up - trading potential contention on atom updates for potentially more cache hits.

Before this commit each selector used the iri-cache differently, and therefore gave broken results if they were combined in a scenario that required iri lookups. This normalizes the cache structure by making every cache update insert an entry in the form of `(vswap! iri-cache assoc pid {:as <iri>})`, so that every cache read can use the results regardless of where it was first inserted. The other fix in this commit is adding a `dbproto/-iri` lookup to the `response/wildcard-spec` function. Before it would return a raw pid in the response, which is unhelpful for users. fixes #521

cap10morgan

🚄

bplatz · 2023-08-24T15:25:31Z

src/fluree/db/query/json_ld/response.cljc

-                     {:as pid})]
-        (vswap! cache assoc pid p-spec)
-        p-spec)))
+  (go-try


If this isn't in a high frequency code path it may not matter, but generally when we use a cache for speed, we want to check the cache before spawning core/async chans which is expensive at volume.

Here you spawn a chan, then conditionally check for an exception, then check for a cached value - where the cached value check needs not a chan, nor an exception check.

You'll note the other similar patterns here where cache checking is done before spawning a chan (see line 68 below)

Good catch, this is in a high frequency code path. I've refactored it to only spawn the thread as a last resort.

cap10morgan

👍🏻

dpetran added 2 commits August 23, 2023 17:11

add failing tests for predicates with subject sids

cf26cc5

cap10morgan approved these changes Aug 24, 2023

View reviewed changes

bplatz reviewed Aug 24, 2023

View reviewed changes

refactor to avoid spawning channel if cache has value

ebac2d5

dpetran requested a review from a team August 24, 2023 17:24

cap10morgan approved these changes Aug 26, 2023

View reviewed changes

dpetran merged commit c3c4996 into main Aug 28, 2023
5 checks passed

dpetran deleted the fix/sids-for-pids branch August 28, 2023 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix sids for pids #563

fix sids for pids #563

dpetran commented Aug 24, 2023 •

edited

Loading

cap10morgan left a comment

bplatz Aug 24, 2023

dpetran Aug 24, 2023

cap10morgan left a comment

fix sids for pids #563

fix sids for pids #563

Conversation

dpetran commented Aug 24, 2023 • edited Loading

cap10morgan left a comment

Choose a reason for hiding this comment

bplatz Aug 24, 2023

Choose a reason for hiding this comment

dpetran Aug 24, 2023

Choose a reason for hiding this comment

cap10morgan left a comment

Choose a reason for hiding this comment

dpetran commented Aug 24, 2023 •

edited

Loading