history query improvements #362

dpetran · 2023-02-01T20:49:07Z

Fixes #357

Changes shape of response to use json-ld iri keys and include the assert/retract subject @ids within each assert/retract entry.

Also adds timestamp support to the from/to t parameters.

zonotope

This will be good with me after a few minor changes.

zonotope · 2023-02-02T16:13:42Z

src/fluree/db/api/query.cljc

+    (let [assert-flakes (not-empty (filter flake/op t-flakes))
+          retract-flakes (not-empty (filter (complement flake/op) t-flakes))
+
+          s-asserts-ch (->> (sort-by flake/s assert-flakes)


This might not matter all that much in practice, but it's faster to use group-by instead of sort-by + partition-by, unless you care that the groups are also sorted by subject in the end.

(->> assert-flakes (group-by flake/s) vals async/to-chan!)

zonotope · 2023-02-02T16:15:19Z

src/fluree/db/api/query.cljc

+          s-asserts-ch (->> (sort-by flake/s assert-flakes)
+                            (partition-by flake/s)
+                            (async/to-chan!))
+          s-retracts-ch (->> (sort-by flake/s retract-flakes)


I think it makes sense to refactor this repeated logic into its own function.

zonotope · 2023-02-02T16:25:30Z

src/fluree/db/api/query.cljc

+                                  (async/pipe ch)))
+                            s-retracts-ch)
+      {(json-ld/compact const/iri-t compact) (- (flake/t (first t-flakes)))
+       (json-ld/compact const/iri-assert compact) (<? s-asserts-json-ch)


<? can throw errors, but s-asserts-json-ch will either fail to close or close without an error. you should use <! instead. The same goes for the line below.

@id

Now every subject in asserts and retracts will have an @id. Also returns proper json-ld.

Using `group-by` instead of `sort-by` + `partition-by` can produce the assert and retract flakes in one traversal instead of two. I added an extra subject in the test suite to make sure that the final results still have a stable sort, and it appears that they do - we don't care about the specific ordering, just that it's stable. Also used <! instead of <? because the assert/retract chans won't ever get an error on them, so the extra error handling isn't necessary.

History queries don't have a concept of "stale flakes" - they want all flakes between `from-t` and `to-t`. The CachedTRangeResolver filters out "stale flakes" but _not_ all of the flakes from before `from-t`. This is a brute force remaking of the call stack so that we can get the correct behavior from `query-range/time-range`. Co-authored-by: Marcela Poffald <[email protected]>

The history query pipeline requires different `t-range` behavior within the `Resolver` which is quite far down the call stack: time-range -> index-range* -> resolve-flake-slices -> CachedTRangeResolver -> t-range One option we considered was passing a `:resolver` opt down from `time-range` with a value of `:history`, and then choosing the resolver behavior there. We decided it was simpler to just treat the history query pipeline as a separate pipeline. We also considered factoring the CachedTRangeResolver to implement a different protocol besides `Resolver`, as it seems to be doing a slightly different job (filtering node flakes) than the other Resolvers (fetching nodes). But, we couldn't think of a good thing to use for polymorphic dispatch. In the end, we inlined the `index-range*` and `resolve-flake-slices` functions into `time-range` so we could provide our own resolver without having to solve the dispatch problem. The consequence is if we fix the regular query pipeline we'll have to duplicate that fix to the history query pipeline in the future, but maybe they won't have the same types of bugs. Co-authored-by: Marcela Poffald <[email protected]>

aaj3f · 2023-02-03T14:46:44Z

@dpetran not sure if this should be another ticket/PR or not, but I do see some odd behavior w/ the :t { :from } results:

  (def ledger-name "test/historyPR")

  (def conn @(fluree/connect {:method       :file
                              :storage-path "data"
                              :defaults     {:context {:id     "@id"
                                                       :type   "@type"
                                                       :schema "http://schema.org/"
                                                       :ex     "http://example.org/ns/"
                                                       :rdfs   "http://www.w3.org/2000/01/rdf-schema#"
                                                       :rdf    "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                                                       :xsd    "http://www.w3.org/2001/XMLSchema#"
                                                       :f      "https://ns.flur.ee/ledger#"}}}))


  (def ledger @(fluree/create conn ledger-name))

  (def db (fluree/db ledger))

  (def db1 @(fluree/stage db {:graph [{:id   :ex/andrew
                                       :type :schema/Person
                                       :schema/name "Andrew"
                                       :ex/favNum 8}]}))
  
  (def committed-db @(fluree/commit! ledger db1))
  
  (def db2 @(fluree/stage (fluree/db ledger) {:graph [{:id :ex/andrew
                                        :ex/favNum 10}]}))
  
  (def committed-db @(fluree/commit! ledger db2)) 
  
  @(fluree/history ledger {:history :ex/andrew})
  ;; => [#:f{:t 2, :assert [{:ex/favNum 10, :id :ex/andrew}], :retract [{:ex/favNum 8, :id :ex/andrew}]}
  ;;     #:f{:t 1, :assert [{:id :ex/andrew, :rdf/type [:schema/Person], :schema/name "Andrew", :ex/favNum 8}], :retract []}]
  
  @(fluree/history ledger {:history :ex/andrew
                           :t {:from 2}})
  ;; => [#:f{:t 2, :assert [{:ex/favNum 10, :id :ex/andrew}], :retract []}
  ;;     #:f{:t 1, :assert [{:id :ex/andrew, :rdf/type [:schema/Person], :schema/name "Andrew"}], :retract []}]

The first call to fluree/history looks great, results exactly as I'd expect.

The second call with :t { :from 2 } seems odd: it (1) includes the :t 1 assertions and (2) oddly leaves out the :t 2 retraction of the original :ex/favNum 10 value (which, coincidentally, is also missing from the :t 1 assertions in this second result body as well)

dpetran · 2023-02-03T15:19:47Z

@aaj3f that's really strange - I have a test for exactly that case and it works fine:

db/test/fluree/db/query/history_test.clj

Line 120 in 6fb8d24

(testing "from-t"

I'll experiment with your example to see if I can reproduce it.

If the exact same node key came through the Resolver the CachedHistoryRangeResolver could return the results cached by the CachedTRangeResolver, since they shared the same cache key prefix. This gives it a unique prefix. Added a test case that fails without the fix. Also removed some unused opts from the `extract-query-flakes` call.

dpetran · 2023-02-03T17:43:20Z

That was a tricky case - the problem was in the caching, and my tests exercised a path with more transactions and therefore no cache for the specific index node that we were filtering for flakes. This last commit adds a unique history cache key so that won't happen, and a test that fails without the fix.

cap10morgan

🌺

dpetran requested a review from a team February 1, 2023 20:49

zonotope approved these changes Feb 2, 2023

View reviewed changes

dpetran added 4 commits February 2, 2023 14:50

return subject ids in history query responses

f3807ce

Now every subject in asserts and retracts will have an @id. Also returns proper json-ld.

add timestamp t support to history query

b3de1c9

use commit times in history timestamp test

a389c8b

dpetran force-pushed the feature/history-query-improvements branch 3 times, most recently from 8398d7a to bb576aa Compare February 2, 2023 21:31

dpetran force-pushed the feature/history-query-improvements branch from bb576aa to 5d819e9 Compare February 2, 2023 21:31

update docstrings

6fb8d24

dpetran force-pushed the feature/history-query-improvements branch from fe1fa31 to 52e1d54 Compare February 3, 2023 17:42

cap10morgan approved these changes Feb 6, 2023

View reviewed changes

dpetran merged commit 10cec7e into main Feb 6, 2023

dpetran deleted the feature/history-query-improvements branch February 6, 2023 17:02

mpoffald mentioned this pull request Feb 13, 2023

Support for returning commits from /history #379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

history query improvements #362

history query improvements #362

dpetran commented Feb 1, 2023

zonotope left a comment

zonotope Feb 2, 2023

zonotope Feb 2, 2023

zonotope Feb 2, 2023

aaj3f commented Feb 3, 2023 •

edited

Loading

dpetran commented Feb 3, 2023

dpetran commented Feb 3, 2023

cap10morgan left a comment

history query improvements #362

history query improvements #362

Conversation

dpetran commented Feb 1, 2023

zonotope left a comment

Choose a reason for hiding this comment

zonotope Feb 2, 2023

Choose a reason for hiding this comment

zonotope Feb 2, 2023

Choose a reason for hiding this comment

zonotope Feb 2, 2023

Choose a reason for hiding this comment

aaj3f commented Feb 3, 2023 • edited Loading

dpetran commented Feb 3, 2023

dpetran commented Feb 3, 2023

cap10morgan left a comment

Choose a reason for hiding this comment

aaj3f commented Feb 3, 2023 •

edited

Loading