Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

history query improvements #362

Merged
merged 8 commits into from
Feb 6, 2023
Merged

Conversation

dpetran
Copy link
Contributor

@dpetran dpetran commented Feb 1, 2023

Fixes #357

Changes shape of response to use json-ld iri keys and include the assert/retract subject @ids within each assert/retract entry.

Also adds timestamp support to the from/to t parameters.

@dpetran dpetran requested a review from a team February 1, 2023 20:49
Copy link
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be good with me after a few minor changes.

(let [assert-flakes (not-empty (filter flake/op t-flakes))
retract-flakes (not-empty (filter (complement flake/op) t-flakes))

s-asserts-ch (->> (sort-by flake/s assert-flakes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not matter all that much in practice, but it's faster to use group-by instead of sort-by + partition-by, unless you care that the groups are also sorted by subject in the end.

(->> assert-flakes
     (group-by flake/s)
     vals
     async/to-chan!)

s-asserts-ch (->> (sort-by flake/s assert-flakes)
(partition-by flake/s)
(async/to-chan!))
s-retracts-ch (->> (sort-by flake/s retract-flakes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to refactor this repeated logic into its own function.

(async/pipe ch)))
s-retracts-ch)
{(json-ld/compact const/iri-t compact) (- (flake/t (first t-flakes)))
(json-ld/compact const/iri-assert compact) (<? s-asserts-json-ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<? can throw errors, but s-asserts-json-ch will either fail to close or close without an error. you should use <! instead. The same goes for the line below.

Now every subject in asserts and retracts will have an @id. Also returns proper json-ld.
Using `group-by` instead of `sort-by` + `partition-by` can produce the assert and
retract flakes in one traversal instead of two. I added an extra subject in the test
suite to make sure that the final results still have a stable sort, and it appears that
they do - we don't care about the specific ordering, just that it's stable.

Also used <! instead of <? because the assert/retract chans won't ever get an error on
them, so the extra error handling isn't necessary.
@dpetran dpetran force-pushed the feature/history-query-improvements branch 3 times, most recently from 8398d7a to bb576aa Compare February 2, 2023 21:31
History queries don't have a concept of "stale flakes" - they want all flakes between
`from-t` and `to-t`. The CachedTRangeResolver filters out "stale flakes" but _not_ all of
the flakes from before `from-t`.

This is a brute force remaking of the call stack so that we can get the correct behavior
from `query-range/time-range`.

Co-authored-by: Marcela Poffald <[email protected]>
@dpetran dpetran force-pushed the feature/history-query-improvements branch from bb576aa to 5d819e9 Compare February 2, 2023 21:31
The history query pipeline requires different `t-range` behavior within the `Resolver`
which is quite far down the call stack:

time-range -> index-range* -> resolve-flake-slices -> CachedTRangeResolver -> t-range

One option we considered was passing a `:resolver` opt down from `time-range` with a
value of `:history`, and then choosing the resolver behavior there. We decided it was
simpler to just treat the history query pipeline as a separate pipeline.

We also considered factoring the CachedTRangeResolver to implement a different
protocol besides `Resolver`, as it seems to be doing a slightly different job (filtering
node flakes) than the other Resolvers (fetching nodes). But, we couldn't think of a good
thing to use for polymorphic dispatch.

In the end, we inlined the `index-range*` and `resolve-flake-slices` functions into
`time-range` so we could provide our own resolver without having to solve the dispatch
problem. The consequence is if we fix the regular query pipeline we'll have to
duplicate that fix to the history query pipeline in the future, but maybe they won't
have the same types of bugs.

Co-authored-by: Marcela Poffald <[email protected]>
@aaj3f
Copy link
Contributor

aaj3f commented Feb 3, 2023

@dpetran not sure if this should be another ticket/PR or not, but I do see some odd behavior w/ the :t { :from } results:

  (def ledger-name "test/historyPR")

  (def conn @(fluree/connect {:method       :file
                              :storage-path "data"
                              :defaults     {:context {:id     "@id"
                                                       :type   "@type"
                                                       :schema "http://schema.org/"
                                                       :ex     "http://example.org/ns/"
                                                       :rdfs   "http://www.w3.org/2000/01/rdf-schema#"
                                                       :rdf    "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                                                       :xsd    "http://www.w3.org/2001/XMLSchema#"
                                                       :f      "https://ns.flur.ee/ledger#"}}}))


  (def ledger @(fluree/create conn ledger-name))

  (def db (fluree/db ledger))

  (def db1 @(fluree/stage db {:graph [{:id   :ex/andrew
                                       :type :schema/Person
                                       :schema/name "Andrew"
                                       :ex/favNum 8}]}))
  
  (def committed-db @(fluree/commit! ledger db1))
  
  (def db2 @(fluree/stage (fluree/db ledger) {:graph [{:id :ex/andrew
                                        :ex/favNum 10}]}))
  
  (def committed-db @(fluree/commit! ledger db2)) 
  
  @(fluree/history ledger {:history :ex/andrew})
  ;; => [#:f{:t 2, :assert [{:ex/favNum 10, :id :ex/andrew}], :retract [{:ex/favNum 8, :id :ex/andrew}]}
  ;;     #:f{:t 1, :assert [{:id :ex/andrew, :rdf/type [:schema/Person], :schema/name "Andrew", :ex/favNum 8}], :retract []}]
  
  @(fluree/history ledger {:history :ex/andrew
                           :t {:from 2}})
  ;; => [#:f{:t 2, :assert [{:ex/favNum 10, :id :ex/andrew}], :retract []}
  ;;     #:f{:t 1, :assert [{:id :ex/andrew, :rdf/type [:schema/Person], :schema/name "Andrew"}], :retract []}]

The first call to fluree/history looks great, results exactly as I'd expect.

The second call with :t { :from 2 } seems odd: it (1) includes the :t 1 assertions and (2) oddly leaves out the :t 2 retraction of the original :ex/favNum 10 value (which, coincidentally, is also missing from the :t 1 assertions in this second result body as well)

@dpetran
Copy link
Contributor Author

dpetran commented Feb 3, 2023

@aaj3f that's really strange - I have a test for exactly that case and it works fine:

(testing "from-t"

I'll experiment with your example to see if I can reproduce it.

If the exact same node key came through the Resolver the CachedHistoryRangeResolver
could return the results cached by the CachedTRangeResolver, since they shared the same
cache key prefix. This gives it a unique prefix.

Added a test case that fails without the fix. Also removed some unused opts from the
`extract-query-flakes` call.
@dpetran dpetran force-pushed the feature/history-query-improvements branch from fe1fa31 to 52e1d54 Compare February 3, 2023 17:42
@dpetran
Copy link
Contributor Author

dpetran commented Feb 3, 2023

That was a tricky case - the problem was in the caching, and my tests exercised a path with more transactions and therefore no cache for the specific index node that we were filtering for flakes. This last commit adds a unique history cache key so that won't happen, and a test that fails without the fix.

Copy link
Contributor

@cap10morgan cap10morgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌺

@dpetran dpetran merged commit 10cec7e into main Feb 6, 2023
@dpetran dpetran deleted the feature/history-query-improvements branch February 6, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finish history-query support
4 participants