Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Move instance type to start of key #316

Merged
merged 9 commits into from
Mar 30, 2022

Conversation

AndrewSisley
Copy link
Contributor

Closes #313

Moves InstanceType to base of key (/[InstanceType]/[CollectionId]/[DocKey]/[FieldId]), earlier talk suggested it woul go after CollectionId, but that proved messy with the PrefixEnd logic so I moved it to the start. Based off #315 for convenience. Benches suggest 40% improvement vs parent branch (~45% vs dev).

feat:
Benchmark_Query_UserSimple_Query_Sync_1-8                               5935        168888 ns/op
Benchmark_Query_UserSimple_Query_Sync_10-8                              5173        231327 ns/op
Benchmark_Query_UserSimple_Query_Sync_100-8                             1482        809465 ns/op
Benchmark_Query_UserSimple_Query_Sync_1000-8                             182       6575926 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_1-8                    5517        214914 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_10-8                   4447        282295 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_100-8                  1341        888862 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_1000-8                  172       7153306 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_1-8               6178        192577 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_10-8              4849        270574 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_100-8             4166        302382 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_1000-8            3980        302929 ns/op
Benchmark_Query_UserSimple_Query_WithMultiLookup_Sync_10-8              2643        431538 ns/op
Benchmark_Query_UserSimple_Query_WithMultiLookup_Sync_100-8             2890        437010 ns/op
Benchmark_Query_UserSimple_Query_WithMultiLookup_Sync_1000-8            2726        432774 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_1-8              6457        182761 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_10-8             6444        183150 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_100-8            6380        184403 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_1000-8           6300        186062 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_1-8                      5430        200723 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_10-8                     4363        271650 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_100-8                    1262       1070725 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_1000-8                    141       8402389 ns/op

parent branch:
Benchmark_Query_UserSimple_Query_Sync_1-8                               6315        172621 ns/op
Benchmark_Query_UserSimple_Query_Sync_10-8                              4368        272744 ns/op
Benchmark_Query_UserSimple_Query_Sync_100-8                              993       1222455 ns/op
Benchmark_Query_UserSimple_Query_Sync_1000-8                             100      10810536 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_1-8                    5454        217531 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_10-8                   3722        324280 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_100-8                   916       1299108 ns/op
Benchmark_Query_UserSimple_Query_WithFilter_Sync_1000-8                  100      11146496 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_1-8               6060        194281 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_10-8              4003        305751 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_100-8             3356        363381 ns/op
Benchmark_Query_UserSimple_Query_WithLimitOffset_Sync_1000-8            3276        361692 ns/op
Benchmark_Query_UserSimple_Query_WithMultiLookup_Sync_10-8              2449        457907 ns/op
Benchmark_Query_UserSimple_Query_WithMultiLookup_Sync_100-8             2648        455376 ns/op
Benchmark_Query_UserSimple_Query_WithMultiLookup_Sync_1000-8            2602        453986 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_1-8              6178        184471 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_10-8             6608        184409 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_100-8            6086        181348 ns/op
Benchmark_Query_UserSimple_Query_WithSingleLookup_Sync_1000-8           6424        184760 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_1-8                      5636        203011 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_10-8                     3794        320852 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_100-8                     870       1382900 ns/op
Benchmark_Query_UserSimple_Query_WithSort_Sync_1000-8                     88      12748211 ns/op

@AndrewSisley AndrewSisley added the perf Performance issue or suggestion label Mar 25, 2022
@AndrewSisley AndrewSisley added this to the DefraDB v0.3 milestone Mar 25, 2022
@AndrewSisley AndrewSisley self-assigned this Mar 25, 2022
@jsimnz
Copy link
Member

jsimnz commented Mar 26, 2022

earlier talk suggested it woul go after CollectionId, but that proved messy with the PrefixEnd logic

Can you expand on what you mean by this a bit? Messy how?

@AndrewSisley
Copy link
Contributor Author

earlier talk suggested it woul go after CollectionId, but that proved messy with the PrefixEnd logic

Can you expand on what you mean by this a bit? Messy how?

If the only components of the key are CollectionId and InstanceType PrefixEnd alters the InstanceType component which did odd things - it was easier to put it at the front than track down what exactly was going on. Didn't see any harm in doing so either

@AndrewSisley AndrewSisley changed the title Sisley/refactor/i313 instance type refactor: Move instance type to start of key Mar 28, 2022
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall in favor, some questions/clarifications. Mainly around generating/passing keys (primary vs Datastore)

The one larger point I wanted to make is keeping everything prefixed by CollectionID. I think its cleanest if both instance and pk are stored after it. This means a single iterator could be used to syncing/exporting/etc. Without having to worry about 3 different prefixes (pk, v, p).

Ive throw together a POC to make the instancetype work after the collection ID, few random places that needed attention beyond PrefixEnd(): https://github.com/sourcenetwork/defradb/tree/jsimnz/refactor/i313A-instance-type.
Note: perf is the same between both versions (instance type before and after)

core/key.go Outdated
Comment on lines 317 to 294
func (k PrimaryDataStoreKey) ToString() string {
result := "/" + PRIMARY_KEY

if k.CollectionId != "" {
result = result + "/" + k.CollectionId
}
if k.DocKey != "" {
result = result + "/" + k.DocKey
}

return result
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel more comfortable having everything scoped under the CollectionID prefix. Creates the cleanest seperation of k/v pairs. A single iteration under the CollectionID prefix can be used for exporting, syncing, etc. Instead of now 3 different prefixes to capture all the data related to a given Collection.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good reasons, although the code will be a bit messier than in your POC (to avoid the mutation in the getters lol).

This change only affects the last commit - would you mind quickly approving #315 so I can work off develop instead?

  • CollectionId first
  • Including pk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed to sort out collection id in a fairly localized fashion - 2 new commits Handle range filters in iterable shim and FIXUP - Keep collectionId first in key

db/collection.go Outdated
Comment on lines 795 to 809
func (c *collection) getPrimaryKeyFromDocKey(docKey client.DocKey) core.PrimaryDataStoreKey {
return core.PrimaryDataStoreKey{
CollectionId: fmt.Sprint(c.colID),
DocKey: docKey.String(),
}
}

func (c *collection) getDataStoreKeyFrom(key core.PrimaryDataStoreKey) core.DataStoreKey {
return core.DataStoreKey{
CollectionId: fmt.Sprint(c.colID),
IndexId: fmt.Sprint(c.PrimaryIndex().ID),
DocKey: key.DocKey,
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a direct method to go from DocKey to HeadStore key, instead of having to call
c.getDataStoreKeyFrom(c.getPrimaryKeyFromDocKey(dockey))?

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I cant actual see what you have described above anywhere - could you clarify please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment below :)

db/collection_get.go Show resolved Hide resolved
@AndrewSisley AndrewSisley force-pushed the sisley/refactor/I313-instance-type branch from ae374f6 to dd263e4 Compare March 29, 2022 16:04
@AndrewSisley AndrewSisley force-pushed the sisley/refactor/I313-instance-type branch 3 times, most recently from 9d7120d to 25663f2 Compare March 29, 2022 21:42
@codecov
Copy link

codecov bot commented Mar 29, 2022

Codecov Report

Merging #316 (05aabc1) into develop (42721cb) will decrease coverage by 0.06%.
The diff coverage is 87.78%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #316      +/-   ##
===========================================
- Coverage    65.07%   65.01%   -0.07%     
===========================================
  Files           80       80              
  Lines         8979     8974       -5     
===========================================
- Hits          5843     5834       -9     
- Misses        2520     2528       +8     
+ Partials       616      612       -4     
Impacted Files Coverage Δ
query/graphql/planner/versionedscan.go 0.00% <0.00%> (ø)
query/graphql/planner/scan.go 77.77% <50.00%> (-0.31%) ⬇️
db/collection_update.go 42.89% <66.66%> (ø)
db/collection.go 53.28% <78.37%> (-0.15%) ⬇️
datastore/iterable/iterable_transaction_shim.go 75.00% <84.21%> (+10.41%) ⬆️
core/key.go 85.97% <93.10%> (+0.25%) ⬆️
db/base/collection_keys.go 90.90% <100.00%> (-2.20%) ⬇️
db/collection_delete.go 60.60% <100.00%> (+0.52%) ⬆️
db/collection_get.go 50.00% <100.00%> (-1.03%) ⬇️
db/fetcher/fetcher.go 60.26% <100.00%> (+1.15%) ⬆️
... and 11 more

@AndrewSisley AndrewSisley force-pushed the sisley/refactor/I313-instance-type branch from 25663f2 to ce227c8 Compare March 29, 2022 21:51
db/collection.go Outdated
@@ -559,14 +548,14 @@ func (c *collection) save(ctx context.Context, txn datastore.Txn, doc *client.Do
// Loop through doc values
// => instantiate MerkleCRDT objects
// => Set/Publish new CRDT values
dockey := core.DataStoreKeyFromDocKey(doc.Key())
primaryKey := c.getPrimaryKeyFromDocKey(doc.Key()).ToDataStoreKey()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== COMMENT HERE ABOUT WEIRD CALL ==
hi :) welcome traveler :)

Anyway, yea this is what I think I was referring to with the other comment you asked clarification on.

Seems like a lot of run around happening when basically we want to go from docKey to DataStoreKey.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option: c.getDataStoreKeyFromDocKey(...)

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay lol - I don't think I really register call chains in this style as weird :) Super common in C#/JS/Rust(?) lol and I prefer chaining them together instead of risking overly specific helper functions that can obfuscate what is actually happening.

In this case - I do think the current setup is clearer in it's intent - primaryKey only is a datastoreKey here as it needs to have a fieldId tagged on within the subsquent loop. This variable is really a PrimaryKey. Would prefer to change it to primaryKey := c.getPrimaryKeyFromDocKey(doc.Key()) and modify getFieldKey to make the cast (golangs lack of overloading might make this messy though but will see).

EDIT: code modified as per above

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndrewSisley AndrewSisley force-pushed the sisley/refactor/I313-instance-type branch from ce227c8 to 05aabc1 Compare March 30, 2022 14:44
@AndrewSisley AndrewSisley merged commit 4c53026 into develop Mar 30, 2022
@AndrewSisley AndrewSisley deleted the sisley/refactor/I313-instance-type branch March 30, 2022 14:54
@AndrewSisley AndrewSisley added the refactor This issue specific to or requires *notable* refactoring of existing codebases and components label Mar 30, 2022
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
* Add key type checks

* Add PrimaryDataStoreKey

* Use primary key over data key in collection

* Remove unwanted c.getPrimaryIndexDocKey call

Is not doing anything, and is conceptually incorrect

* Correct update pk usage

* Correct save pk usage

* Remove index from key

* Handle range filters in iterable shim

* Move InstanceType to start of key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance issue or suggestion refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data Key: Move InstanceType from tail to earlier element
2 participants