feat: Add collectionId field to commit field #1235

islamaliev · 2023-03-27T16:19:17Z

Relevant issue(s)

Resolves #849

Description

This PR adds a new field "collectionId" to commit field that can be queried now, grouped by and ordered by.

Tasks

I made sure the code is well commented, particularly hard-to-understand areas.
I made sure the repository-held documentation is changed accordingly.
I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

Integration tests

Specify the platform(s) on which this was tested:

MacOS

source-devs · 2023-03-27T16:29:05Z

Benchmark Results

Summary

0 Benchmarks successfully compared.
0 Benchmarks were ✅ Better.
0 Benchmarks were ❌ Worse .
0 Benchmarks were ✨ Unchanged.

✅ See Better Results...

	time/op		delta

❌ See Worse Results...

	time/op		delta

✨ See Unchanged Results...

	time/op		delta

🐋 See Full Results...

fredcarle

LGTM. Just a minor nitpick before merge.

planner/commit.go

AndrewSisley

LGTM :)

codecov · 2023-03-27T18:34:03Z

Codecov Report

Merging #1235 (96f6daf) into develop (9f6a2c6) will decrease coverage by 0.04%.
The diff coverage is 70.00%.

@@             Coverage Diff             @@
##           develop    #1235      +/-   ##
===========================================
- Coverage    70.71%   70.68%   -0.04%     
===========================================
  Files          182      182              
  Lines        17206    17225      +19     
===========================================
+ Hits         12167    12175       +8     
- Misses        4109     4119      +10     
- Partials       930      931       +1

Impacted Files	Coverage Δ
db/txn_db.go	`45.27% <20.00%> (-1.33%)`	⬇️
planner/commit.go	`80.42% <80.00%> (-0.69%)`	⬇️
core/key.go	`86.61% <87.50%> (ø)`
db/base/collection_keys.go	`90.90% <100.00%> (ø)`
db/collection.go	`68.70% <100.00%> (ø)`

... and 4 files with indirect coverage changes

shahzadlone

Some non-blocking comments, looks good.

shahzadlone · 2023-03-27T18:39:07Z

client/request/consts.go

@@ -47,6 +47,7 @@ const (
 	HeightFieldName          = "height"
 	CidFieldName             = "cid"
 	DockeyFieldName          = "dockey"
+	CollectionIDFieldName    = "collectionId"


question: @fredcarle are go gods happy with "collectionId" over "collectionID"? I see we have "schemaVersionId" before (but this is a string not a variable name).

I've been thinking about that one. I'm not sure what is better in this case because that string representation is how it is displayed and used in GraphQL. It depends if we want to apply Go like formatting in the GraphQL representation.

~~schemaVersionId will be my fault - I regularly forget to uppercase acronyms :P It should be schemaVersionID~~ misread nevermind :)

Here is what GraphQL style guides seems to be:

- Field names should use camelCase. Many GraphQL clients are written in JavaScript, Java, Kotlin, or Swift, all of which recommend camelCase for variable names. - Type names should use PascalCase. This matches how classes are defined in the languages mentioned above. - Enum names should use PascalCase. - Enum values should use ALL_CAPS, because they are similar to constants.

The Go representation would fit this guide with the difference of acronyms using uppercase. We could be consistent and apply that everywhere so the string representation would become "collectionID". It might be less confusing.

todo: Change to "collectionID" then

tests/integration/query/commits/utils.go

jsimnz

Theres a problem with the approach here. Based on the #891 PR, which was based on previous work, we shouldn't be persisting the entire DatastoreKey into the DAG. That issue can be solved separately from this issue in a follow-up PR (more CID changes 😂 yay)

What needs to change in this PR is how we get the CollectionID. At the moment, the CollectionID is a "local" item, compared to something like the dockey or schemaVersionID which is a "global" item.

The difference is that since this is a Peer-to-Peer database, anything that exists in one DB locally, can potentially be replicated to any other DB globally, so we need to keep that in mind when making changes, what state is local to the node and can be changed freely, and what state is global to the network.

The reason DocKeys and SchemaVersionIDs are global is that they are based on the CID system, which is a global namespace since its effectively just a hash.

The CollectionID is just a local sequence number starting at 0 and incrementing for each collection that gets added. It isn't safe to be used in a global context.

But, all the work in this PR is mostly still necessary, since we do want to expose the CollectionID from the GQL perspective.

So, a nice solution is to omit adding the CollectionID from the DAG, which isn't explicity changed in this PR, but from #891 incorporating the full DatastoreKey. Since that needs to change (as mentioned in a followup PR), we can still implement the GQL necessary changes without waiting for that change to land.

Basically, instead of getting the CollectionID from the DatastoreKey from within the DAG, we cna get the schemaVersionID from within the DAG, and do a lookup for the collection based on the schemaVersionID. Would require making a change to the client.DB interface to expose the getCollectionByVersionID which is currently private on the db type.

In reality, the short of adding the new public func, the only lines that change from the current implementation is collectionID, err := strconv.Atoi(dockeyObj.CollectionID).

cc: @AndrewSisley to make sure I've gotten everything correct, and if he has any objections to exposing GetCollectionByVersionID on client.DB.

fredcarle · 2023-03-27T19:24:59Z

So, a nice solution is to omit adding the CollectionID from the DAG, which isn't explicity changed in this PR, but from #891 incorporating the full DatastoreKey. Since that needs to change (as mentioned in a followup PR), we can still implement the GQL necessary changes without waiting for that change to land.

We previously talked about having a GlobalCollectionID. I was seeing this as a stepping stone to that. So in the next iteration, the saved collectionID on the DAG would actually be the GlobalCollectionID.

However, we do need the functionality of getting the local collectionID and probably not from the schemaID since that can be the same for multiple collections (in the future).

AndrewSisley · 2023-03-27T19:25:11Z

...

cc: @AndrewSisley to make sure I've gotten everything correct, and if he has any objections to exposing GetCollectionByVersionID on client.DB.

Is a really good catch, and all looks good and sensible - I strongly agree that this PR should change to fetch it the 'right' way, as we are close to the end of the release cycle and it doesnt feel safe to assume that we can publicly expose this as-is and hope we'll get it working correctly in the meantime.

It might also be better to prioritise the removal of collectionID from the DAG before the release, as that is a persisted data corruption of sorts.

There is actually a ticket to expose GetCollectionByVersionID already - it is probably little more that a case of adding the func signature to the client interface: #1007.

AndrewSisley · 2023-03-27T19:30:17Z

However, we do need the functionality of getting the local collectionID and probably not from the schemaID since that can be the same for multiple collections (in the future).

Local data cant be allowed into the (global) DAG. When we sort out multiple collections from the same schema, and if then we need to tie the commit to a local collection for this kind of query, we have to do it without storing it in the 'normal' commit block.

AndrewSisley

I need a comment to request changes - the reason is RE John's excellent spot

fredcarle · 2023-03-27T19:37:11Z

Local data cant be allowed into the (global) DAG. When we sort out multiple collections from the same schema, and if then we need to tie the commit to a local collection for this kind of query, we have to do it without storing it in the 'normal' commit block.

I agree. My comment does not debate that :)

AndrewSisley · 2023-03-27T19:39:19Z

I agree. My comment does not debate that :)

Ah sorry I thought you were suggesting we store the local collectionID for now and then upgrade it to a global collection id later :)

fredcarle · 2023-03-27T19:46:31Z

Ah sorry I thought you were suggesting we store the local collectionID for now and then upgrade it to a global collection id later :)

I was saying that's what I was thinking about when I reviewed it. The part you highlighted doesn't imply that it has to be on the DAG.

In the short term, though, it probably doesn't matter if the local collectionID is on the DAG as most nodes will be created with the same collections.

It would be quite easy to have a global collection ID though. It would just the hash of the schemaID plus the collection name. That could easily be done in this PR to replace the local collection ID in the DAG.

jsimnz · 2023-03-27T20:00:26Z

It would be quite easy to have a global collection ID though. It would just the hash of the schemaID plus the collection name. That could easily be done in this PR to replace the local collection ID in the DAG.

I wouldn't say that is a sufficient global ID. It is in the short term, but im hoping to get away from that. Once digital signatures land, we can have an easier time with a proper global ID. Its possible to use your suggestion in the short term, but I think it needs a bit more discussion.

The downside to my current suggestion is that it limits the DBs to one collection per schema, but that is already a limitation, and wont be solved until #1032 and tangential efforts have been solved.

In the short term, though, it probably doesn't matter if the local collectionID is on the DAG as most nodes will be created with the same collections.

I also disgree here, as im pretty sensitive of what ends up in the DAG. Neither I nor Andy can remember why the full DatastoreKey is being persisted from before this PR, but its an example of how things can go bad if we arent careful about the DAG.

Since we would have to remove it, and break more stuff. The use of schemaVersionID to get the collectionID is safe, and doesnt introduce any local or short term breaks into the DAG.

There is actually a ticket to expose GetCollectionByVersionID already - it is probably little more that a case of adding the func signature to the client interface: #1007.

That ticket seems a little more involved, unless im reading it wrong.

AndrewSisley

LGTM, thanks Islam :)

AndrewSisley · 2023-03-28T17:55:50Z

planner/commit.go

@@ -302,7 +303,16 @@ func (n *dagScanNode) dagBlockToNodeDoc(block blocks.Block) (core.Doc, []*ipld.L
 	if err != nil {
 		return core.Doc{}, nil, err
 	}
-	n.commitSelect.DocumentMapping.SetFirstOfName(&commit, "dockey", dockeyObj.DocKey)
+	n.commitSelect.DocumentMapping.SetFirstOfName(&commit,


nitpick: I think people here do prefer the below, instead of the current (no need to change now, but in future PRs consider this):

n.commitSelect.DocumentMapping.SetFirstOfName( &commit, request.DockeyFieldName, dockeyObj.DocKey, )

jsimnz

LGTM! Thanks for accommodating the abrupt requirements change.

shahzadlone

LGTM

Add collectionID field to commit Commits can be grouped and ordered by collectionID To retrieve a value for collectionID the method GetCollectionByVersionID is added to db interface.

islamaliev requested review from shahzadlone and AndrewSisley March 27, 2023 16:19

fredcarle approved these changes Mar 27, 2023

View reviewed changes

planner/commit.go Outdated Show resolved Hide resolved

jsimnz added feature New feature or request action/no-benchmark Skips the action that runs the benchmark. labels Mar 27, 2023

jsimnz added this to the DefraDB v0.5 milestone Mar 27, 2023

AndrewSisley approved these changes Mar 27, 2023

View reviewed changes

shahzadlone approved these changes Mar 27, 2023

View reviewed changes

jsimnz requested changes Mar 27, 2023

View reviewed changes

AndrewSisley requested changes Mar 27, 2023

View reviewed changes

AndrewSisley mentioned this pull request Mar 27, 2023

Try and come up with means to protect against introducing local stuff into global data areas #1238

Open

islamaliev force-pushed the islam/feat/I849-add-collectionid-to-commit branch from f8956c1 to 8c17b1b Compare March 27, 2023 21:02

islamaliev and others added 6 commits March 28, 2023 10:57

Add collectionId field to commit

1069110

Add tests for grouping and ordering collectionId

8102fea

Add tests for collectionId on latestCommit

e67d704

Adjust test

a47b489

Adjust names

5a32f77

Use collection fetched from to read it's ID

17eb39c

islamaliev force-pushed the islam/feat/I849-add-collectionid-to-commit branch from 5c813a9 to 17eb39c Compare March 28, 2023 09:01

islamaliev requested review from AndrewSisley and jsimnz March 28, 2023 09:15

Rename collectionId to collectionID

96f6daf

AndrewSisley approved these changes Mar 28, 2023

View reviewed changes

jsimnz approved these changes Mar 28, 2023

View reviewed changes

shahzadlone approved these changes Mar 28, 2023

View reviewed changes

islamaliev merged commit c9ef176 into develop Mar 28, 2023

islamaliev deleted the islam/feat/I849-add-collectionid-to-commit branch March 28, 2023 19:30

islamaliev self-assigned this Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add collectionId field to commit field #1235

feat: Add collectionId field to commit field #1235

islamaliev commented Mar 27, 2023 •

edited

Loading

source-devs commented Mar 27, 2023

fredcarle left a comment

AndrewSisley left a comment

codecov bot commented Mar 27, 2023 •

edited

Loading

shahzadlone left a comment

shahzadlone Mar 27, 2023

fredcarle Mar 27, 2023

AndrewSisley Mar 27, 2023 •

edited

Loading

shahzadlone Mar 27, 2023

fredcarle Mar 27, 2023

shahzadlone Mar 27, 2023

jsimnz left a comment

fredcarle commented Mar 27, 2023

AndrewSisley commented Mar 27, 2023

AndrewSisley commented Mar 27, 2023 •

edited

Loading

AndrewSisley left a comment

fredcarle commented Mar 27, 2023

AndrewSisley commented Mar 27, 2023 •

edited

Loading

fredcarle commented Mar 27, 2023

jsimnz commented Mar 27, 2023

AndrewSisley left a comment

AndrewSisley Mar 28, 2023

jsimnz left a comment

shahzadlone left a comment

feat: Add collectionId field to commit field #1235

feat: Add collectionId field to commit field #1235

Conversation

islamaliev commented Mar 27, 2023 • edited Loading

Relevant issue(s)

Description

Tasks

How has this been tested?

source-devs commented Mar 27, 2023

Benchmark Results

Summary

fredcarle left a comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 27, 2023 • edited Loading

Codecov Report

shahzadlone left a comment

Choose a reason for hiding this comment

shahzadlone Mar 27, 2023

Choose a reason for hiding this comment

fredcarle Mar 27, 2023

Choose a reason for hiding this comment

AndrewSisley Mar 27, 2023 • edited Loading

Choose a reason for hiding this comment

shahzadlone Mar 27, 2023

Choose a reason for hiding this comment

fredcarle Mar 27, 2023

Choose a reason for hiding this comment

shahzadlone Mar 27, 2023

Choose a reason for hiding this comment

jsimnz left a comment

Choose a reason for hiding this comment

fredcarle commented Mar 27, 2023

AndrewSisley commented Mar 27, 2023

AndrewSisley commented Mar 27, 2023 • edited Loading

AndrewSisley left a comment

Choose a reason for hiding this comment

fredcarle commented Mar 27, 2023

AndrewSisley commented Mar 27, 2023 • edited Loading

fredcarle commented Mar 27, 2023

jsimnz commented Mar 27, 2023

AndrewSisley left a comment

Choose a reason for hiding this comment

AndrewSisley Mar 28, 2023

Choose a reason for hiding this comment

jsimnz left a comment

Choose a reason for hiding this comment

shahzadlone left a comment

Choose a reason for hiding this comment

islamaliev commented Mar 27, 2023 •

edited

Loading

codecov bot commented Mar 27, 2023 •

edited

Loading

AndrewSisley Mar 27, 2023 •

edited

Loading

AndrewSisley commented Mar 27, 2023 •

edited

Loading

AndrewSisley commented Mar 27, 2023 •

edited

Loading