A few improvements #536

zonotope · 2023-07-21T12:14:37Z

This patch contains a few improvements I uncovered while investigating switching to persistent-sorted-set in #523.

Besides cleanup of unused namespaces, functions, and index node attributes, the major changes are to add start and end flake parameters to the fluree.db.index/tree-chan api. This allows us to rely on the tree-chan function itself to trim an index leaf node's flakes immediately upon retrieval instead of forcing us to do it after the fact with a transducer. It also allows us to chose only the branch node's children we need to consider between the start and end flakes in logarithmic time instead of checking each child in linear time.

Besides that, there are changes to make use of transients more when adding to/removing from novelty sets and correcting an error where child maps on branch nodes were silently converted to unsorted maps when calculating ttids.

If this eats up too much memory, we can bump it back down. It's pretty sloooooow right now because it's running single threaded.

zonotope · 2023-07-21T12:59:47Z

I have run a benchmark query against a ledger with the schema.org vocabulary loaded in it for both this branch and main. Of course, all the caveats with the unreliability of benchmarks apply, but there is a modest improvement.

Benchmark Query

@(fluree/query db '{:select [?s]
                    :where  [[?s "schema:rangeIncludes" {"@id" "schema:DateTime"}]]})

main

Evaluation count : 1440 in 60 samples of 24 calls.
Execution time mean : 43.707714 ms
Execution time std-deviation : 515.532382 µs
Execution time lower quantile : 43.026732 ms ( 2.5%)
Execution time upper quantile : 44.662472 ms (97.5%)
Overhead used : 5.241825 ns

Found 2 outliers in 60 samples (3.3333 %)
low-severe 1 (1.6667 %)
low-mild 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

this patch

Evaluation count : 1440 in 60 samples of 24 calls.
Execution time mean : 42.863995 ms
Execution time std-deviation : 601.249422 µs
Execution time lower quantile : 42.288177 ms ( 2.5%)
Execution time upper quantile : 44.593585 ms (97.5%)
Overhead used : 5.300420 ns

Found 6 outliers in 60 samples (10.0000 %)
low-severe 4 (6.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

cap10morgan

Some minor formatting & documentation stuff and a perf question but looks like a ton of good cleanups, fixes, and improvements otherwise!

cap10morgan · 2023-07-24T18:03:22Z

src/fluree/db/flake.cljc

-  (->Flake util/max-long 0 util/max-long const/$xsd:decimal 0 true nil))
+  (->Flake max-s max-p max-s max-dt max-t max-op max-meta))
+
+(def minimum


Add a docstring of "The smallest flake possible"?

cap10morgan · 2023-07-24T18:05:57Z

src/fluree/db/flake.cljc

+  [ss to-add to-remove]
+  (as-> (transient ss) trans
+    (reduce disj! trans to-remove)
+    (reduce conj! trans to-add)


Does reduce negate some of the benefit of the transient here? Since we don't care about the return value each time, would it be noticeably more performant to just loop these for the side effects?

You're totally right! I just didn't think about it. I'll change it.

cap10morgan · 2023-07-24T18:09:18Z

src/fluree/db/index.cljc

+    (try* (let [resolved (<? (resolve r node))]
+            (trim-node resolved start-flake end-flake))
+          (catch* e
+                  (log/error e


Can you indent the body of the catch* like catch? I.e. two spaces more than (catch* ...

cap10morgan · 2023-07-24T18:12:39Z

src/fluree/db/indexer/default.cljc

@@ -402,7 +402,7 @@
        novel?     (fn [node]
                     (or (seq remove-preds)
                         (seq (index/novelty-subrange node t novelty))))]
-    (->> (index/tree-chan conn root novel? (constantly true) 1 refresh-xf error-ch)
+    (->> (index/tree-chan conn root novel? 4 refresh-xf error-ch)


4 feels kind of magic-number-y here in a way that 1 doesn't as much. Should it be def'd somewhere to give it a name?

It is. I just felt like we should be doing this with higher parallelization, and 4 seemed like a fine number. In reality though, this type of thing should be parameterized. I'll change it back to one and make a note to parameterize this later.

dpetran

🎏 Looks good!

We should parameterize this later

zonotope added 27 commits June 18, 2023 10:01

combine query-filter xf and flake xf for efficiency

ef9f322

remove deprecated :block parameter

f7622ce

Merge remote-tracking branch 'origin/main' into fix/delete-all-matches

cb5e9a6

remove unused fns

02ee48d

use subrange fn defined in flake ns

16a066e

remove unused fns and references to network attribute

b1971b9

ledger-id -> ledger-alias

d88f4b1

remove references to network/ledger-id in storage

7fb543e

fluree.db.storage.core -> fluree.db.storage

5a06b27

fix recursive invocation arity

32739b1

remove persistent-sorted-set dependency

33dd70e

remove unused functions

0323a1a

remove unused namespace

6428114

fix typo

f3b397d

use pre-existing util/sequential fn

95f9a05

resolve empty branches too

ce880df

don't resolve empty nodes unless necessary

a8883fa

use sorted map for branch children again

ac3d736

remove include? argument to tree-chan in favor of a filtering xf

366784b

add min/max values for flake components

067de5d

ensure child maps are *always* avl maps

d9bcd02

use transients to add/remove flakes to novelty sets for perfermance

90582b2

cleanup ns requires

d6a1f7d

add minimum flake for consistency with max; add nearest ss fn

68407a1

add start/end flakes to tree-chan; trim resolved nodes based on them

8366750

rely on tree-chan to trim resolved nodes instead of with an xf

969638e

bump up reindexing parallelism because why not?

e8b737f

If this eats up too much memory, we can bump it back down. It's pretty sloooooow right now because it's running single threaded.

zonotope requested a review from a team July 21, 2023 12:14

add/update docstrings

6de6c2b

zonotope added 2 commits July 21, 2023 18:04

remove unused fns

7f5e626

Merge remote-tracking branch 'origin/main' into fix/delete-all-matches

479215e

This was referenced Jul 22, 2023

Fix/subject object scans #539

Merged

Feature/remove psot #540

Merged

cap10morgan approved these changes Jul 24, 2023

View reviewed changes

dpetran approved these changes Jul 24, 2023

View reviewed changes

zonotope added 5 commits July 24, 2023 22:15

Merge remote-tracking branch 'origin/main' into fix/delete-all-matches

a0900ed

use loops instead of reduce for performance

1b38569

use 1 for buffersize

a589708

We should parameterize this later

correct indentation

6ef7eaa

change function def to please linter

d98579f

zonotope merged commit 1b7b60d into main Jul 24, 2023
5 checks passed

zonotope deleted the fix/delete-all-matches branch July 24, 2023 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few improvements #536

A few improvements #536

zonotope commented Jul 21, 2023

zonotope commented Jul 21, 2023

cap10morgan left a comment •

edited

Loading

cap10morgan Jul 24, 2023

cap10morgan Jul 24, 2023

zonotope Jul 24, 2023

cap10morgan Jul 24, 2023

cap10morgan Jul 24, 2023

zonotope Jul 24, 2023

dpetran left a comment

A few improvements #536

A few improvements #536

Conversation

zonotope commented Jul 21, 2023

zonotope commented Jul 21, 2023

Benchmark Query

main

this patch

cap10morgan left a comment • edited Loading

Choose a reason for hiding this comment

cap10morgan Jul 24, 2023

Choose a reason for hiding this comment

cap10morgan Jul 24, 2023

Choose a reason for hiding this comment

zonotope Jul 24, 2023

Choose a reason for hiding this comment

cap10morgan Jul 24, 2023

Choose a reason for hiding this comment

cap10morgan Jul 24, 2023

Choose a reason for hiding this comment

zonotope Jul 24, 2023

Choose a reason for hiding this comment

dpetran left a comment

Choose a reason for hiding this comment

cap10morgan left a comment •

edited

Loading