-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few improvements #536
A few improvements #536
Conversation
If this eats up too much memory, we can bump it back down. It's pretty sloooooow right now because it's running single threaded.
I have run a benchmark query against a ledger with the schema.org vocabulary loaded in it for both this branch and main. Of course, all the caveats with the unreliability of benchmarks apply, but there is a modest improvement. Benchmark Query@(fluree/query db '{:select [?s]
:where [[?s "schema:rangeIncludes" {"@id" "schema:DateTime"}]]}) mainEvaluation count : 1440 in 60 samples of 24 calls. Found 2 outliers in 60 samples (3.3333 %) this patchEvaluation count : 1440 in 60 samples of 24 calls. Found 6 outliers in 60 samples (10.0000 %) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor formatting & documentation stuff and a perf question but looks like a ton of good cleanups, fixes, and improvements otherwise!
(->Flake util/max-long 0 util/max-long const/$xsd:decimal 0 true nil)) | ||
(->Flake max-s max-p max-s max-dt max-t max-op max-meta)) | ||
|
||
(def minimum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a docstring of "The smallest flake possible"
?
src/fluree/db/flake.cljc
Outdated
[ss to-add to-remove] | ||
(as-> (transient ss) trans | ||
(reduce disj! trans to-remove) | ||
(reduce conj! trans to-add) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does reduce
negate some of the benefit of the transient here? Since we don't care about the return value each time, would it be noticeably more performant to just loop these for the side effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right! I just didn't think about it. I'll change it.
src/fluree/db/index.cljc
Outdated
(try* (let [resolved (<? (resolve r node))] | ||
(trim-node resolved start-flake end-flake)) | ||
(catch* e | ||
(log/error e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you indent the body of the catch*
like catch
? I.e. two spaces more than (catch* ...
src/fluree/db/indexer/default.cljc
Outdated
@@ -402,7 +402,7 @@ | |||
novel? (fn [node] | |||
(or (seq remove-preds) | |||
(seq (index/novelty-subrange node t novelty))))] | |||
(->> (index/tree-chan conn root novel? (constantly true) 1 refresh-xf error-ch) | |||
(->> (index/tree-chan conn root novel? 4 refresh-xf error-ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4
feels kind of magic-number-y here in a way that 1
doesn't as much. Should it be def'd somewhere to give it a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. I just felt like we should be doing this with higher parallelization, and 4 seemed like a fine number. In reality though, this type of thing should be parameterized. I'll change it back to one and make a note to parameterize this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎏 Looks good!
This patch contains a few improvements I uncovered while investigating switching to persistent-sorted-set in #523.
Besides cleanup of unused namespaces, functions, and index node attributes, the major changes are to add start and end flake parameters to the
fluree.db.index/tree-chan
api. This allows us to rely on thetree-chan
function itself to trim an index leaf node's flakes immediately upon retrieval instead of forcing us to do it after the fact with a transducer. It also allows us to chose only the branch node's children we need to consider between the start and end flakes in logarithmic time instead of checking each child in linear time.Besides that, there are changes to make use of transients more when adding to/removing from novelty sets and correcting an error where child maps on branch nodes were silently converted to unsorted maps when calculating ttids.