Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add collectionId field to commit field #1235

Merged
merged 7 commits into from
Mar 28, 2023

Conversation

islamaliev
Copy link
Contributor

@islamaliev islamaliev commented Mar 27, 2023

Relevant issue(s)

Resolves #849

Description

This PR adds a new field "collectionId" to commit field that can be queried now, grouped by and ordered by.

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

Integration tests

Specify the platform(s) on which this was tested:

  • MacOS

@source-devs
Copy link

Benchmark Results

Summary

  • 0 Benchmarks successfully compared.
  • 0 Benchmarks were ✅ Better.
  • 0 Benchmarks were ❌ Worse .
  • 0 Benchmarks were ✨ Unchanged.
✅ See Better Results...
time/opdelta
 
❌ See Worse Results...
time/opdelta
 
✨ See Unchanged Results...
time/opdelta
 
🐋 See Full Results...

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a minor nitpick before merge.

planner/commit.go Outdated Show resolved Hide resolved
@jsimnz jsimnz added feature New feature or request action/no-benchmark Skips the action that runs the benchmark. labels Mar 27, 2023
@jsimnz jsimnz added this to the DefraDB v0.5 milestone Mar 27, 2023
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Merging #1235 (96f6daf) into develop (9f6a2c6) will decrease coverage by 0.04%.
The diff coverage is 70.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1235      +/-   ##
===========================================
- Coverage    70.71%   70.68%   -0.04%     
===========================================
  Files          182      182              
  Lines        17206    17225      +19     
===========================================
+ Hits         12167    12175       +8     
- Misses        4109     4119      +10     
- Partials       930      931       +1     
Impacted Files Coverage Δ
db/txn_db.go 45.27% <20.00%> (-1.33%) ⬇️
planner/commit.go 80.42% <80.00%> (-0.69%) ⬇️
core/key.go 86.61% <87.50%> (ø)
db/base/collection_keys.go 90.90% <100.00%> (ø)
db/collection.go 68.70% <100.00%> (ø)

... and 4 files with indirect coverage changes

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non-blocking comments, looks good.

@@ -47,6 +47,7 @@ const (
HeightFieldName = "height"
CidFieldName = "cid"
DockeyFieldName = "dockey"
CollectionIDFieldName = "collectionId"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: @fredcarle are go gods happy with "collectionId" over "collectionID"? I see we have "schemaVersionId" before (but this is a string not a variable name).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about that one. I'm not sure what is better in this case because that string representation is how it is displayed and used in GraphQL. It depends if we want to apply Go like formatting in the GraphQL representation.

Copy link
Contributor

@AndrewSisley AndrewSisley Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemaVersionId will be my fault - I regularly forget to uppercase acronyms :P It should be schemaVersionID misread nevermind :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what GraphQL style guides seems to be:

- Field names should use camelCase. Many GraphQL clients are written in JavaScript, Java, Kotlin, or Swift, all of which recommend camelCase for variable names.
- Type names should use PascalCase. This matches how classes are defined in the languages mentioned above.
- Enum names should use PascalCase.
- Enum values should use ALL_CAPS, because they are similar to constants.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go representation would fit this guide with the difference of acronyms using uppercase. We could be consistent and apply that everywhere so the string representation would become "collectionID". It might be less confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Change to "collectionID" then

tests/integration/query/commits/utils.go Show resolved Hide resolved
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres a problem with the approach here. Based on the #891 PR, which was based on previous work, we shouldn't be persisting the entire DatastoreKey into the DAG. That issue can be solved separately from this issue in a follow-up PR (more CID changes 😂 yay)

What needs to change in this PR is how we get the CollectionID. At the moment, the CollectionID is a "local" item, compared to something like the dockey or schemaVersionID which is a "global" item.

The difference is that since this is a Peer-to-Peer database, anything that exists in one DB locally, can potentially be replicated to any other DB globally, so we need to keep that in mind when making changes, what state is local to the node and can be changed freely, and what state is global to the network.

The reason DocKeys and SchemaVersionIDs are global is that they are based on the CID system, which is a global namespace since its effectively just a hash.

The CollectionID is just a local sequence number starting at 0 and incrementing for each collection that gets added. It isn't safe to be used in a global context.

But, all the work in this PR is mostly still necessary, since we do want to expose the CollectionID from the GQL perspective.

So, a nice solution is to omit adding the CollectionID from the DAG, which isn't explicity changed in this PR, but from #891 incorporating the full DatastoreKey. Since that needs to change (as mentioned in a followup PR), we can still implement the GQL necessary changes without waiting for that change to land.

Basically, instead of getting the CollectionID from the DatastoreKey from within the DAG, we cna get the schemaVersionID from within the DAG, and do a lookup for the collection based on the schemaVersionID. Would require making a change to the client.DB interface to expose the getCollectionByVersionID which is currently private on the db type.

In reality, the short of adding the new public func, the only lines that change from the current implementation is collectionID, err := strconv.Atoi(dockeyObj.CollectionID).

cc: @AndrewSisley to make sure I've gotten everything correct, and if he has any objections to exposing GetCollectionByVersionID on client.DB.

@fredcarle
Copy link
Collaborator

So, a nice solution is to omit adding the CollectionID from the DAG, which isn't explicity changed in this PR, but from #891 incorporating the full DatastoreKey. Since that needs to change (as mentioned in a followup PR), we can still implement the GQL necessary changes without waiting for that change to land.

We previously talked about having a GlobalCollectionID. I was seeing this as a stepping stone to that. So in the next iteration, the saved collectionID on the DAG would actually be the GlobalCollectionID.

However, we do need the functionality of getting the local collectionID and probably not from the schemaID since that can be the same for multiple collections (in the future).

@AndrewSisley
Copy link
Contributor

...

cc: @AndrewSisley to make sure I've gotten everything correct, and if he has any objections to exposing GetCollectionByVersionID on client.DB.

Is a really good catch, and all looks good and sensible - I strongly agree that this PR should change to fetch it the 'right' way, as we are close to the end of the release cycle and it doesnt feel safe to assume that we can publicly expose this as-is and hope we'll get it working correctly in the meantime.

It might also be better to prioritise the removal of collectionID from the DAG before the release, as that is a persisted data corruption of sorts.

There is actually a ticket to expose GetCollectionByVersionID already - it is probably little more that a case of adding the func signature to the client interface: #1007.

@AndrewSisley
Copy link
Contributor

AndrewSisley commented Mar 27, 2023

However, we do need the functionality of getting the local collectionID and probably not from the schemaID since that can be the same for multiple collections (in the future).

Local data cant be allowed into the (global) DAG. When we sort out multiple collections from the same schema, and if then we need to tie the commit to a local collection for this kind of query, we have to do it without storing it in the 'normal' commit block.

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a comment to request changes - the reason is RE John's excellent spot

@fredcarle
Copy link
Collaborator

Local data cant be allowed into the (global) DAG. When we sort out multiple collections from the same schema, and if then we need to tie the commit to a local collection for this kind of query, we have to do it without storing it in the 'normal' commit block.

I agree. My comment does not debate that :)

@AndrewSisley
Copy link
Contributor

AndrewSisley commented Mar 27, 2023

I agree. My comment does not debate that :)

Ah sorry I thought you were suggesting we store the local collectionID for now and then upgrade it to a global collection id later :)

@fredcarle
Copy link
Collaborator

Ah sorry I thought you were suggesting we store the local collectionID for now and then upgrade it to a global collection id later :)

I was saying that's what I was thinking about when I reviewed it. The part you highlighted doesn't imply that it has to be on the DAG.

In the short term, though, it probably doesn't matter if the local collectionID is on the DAG as most nodes will be created with the same collections.

It would be quite easy to have a global collection ID though. It would just the hash of the schemaID plus the collection name. That could easily be done in this PR to replace the local collection ID in the DAG.

@jsimnz
Copy link
Member

jsimnz commented Mar 27, 2023

It would be quite easy to have a global collection ID though. It would just the hash of the schemaID plus the collection name. That could easily be done in this PR to replace the local collection ID in the DAG.

I wouldn't say that is a sufficient global ID. It is in the short term, but im hoping to get away from that. Once digital signatures land, we can have an easier time with a proper global ID. Its possible to use your suggestion in the short term, but I think it needs a bit more discussion.

The downside to my current suggestion is that it limits the DBs to one collection per schema, but that is already a limitation, and wont be solved until #1032 and tangential efforts have been solved.

In the short term, though, it probably doesn't matter if the local collectionID is on the DAG as most nodes will be created with the same collections.

I also disgree here, as im pretty sensitive of what ends up in the DAG. Neither I nor Andy can remember why the full DatastoreKey is being persisted from before this PR, but its an example of how things can go bad if we arent careful about the DAG.

Since we would have to remove it, and break more stuff. The use of schemaVersionID to get the collectionID is safe, and doesnt introduce any local or short term breaks into the DAG.

There is actually a ticket to expose GetCollectionByVersionID already - it is probably little more that a case of adding the func signature to the client interface: #1007.

That ticket seems a little more involved, unless im reading it wrong.

@islamaliev islamaliev force-pushed the islam/feat/I849-add-collectionid-to-commit branch from f8956c1 to 8c17b1b Compare March 27, 2023 21:02
@islamaliev islamaliev force-pushed the islam/feat/I849-add-collectionid-to-commit branch from 5c813a9 to 17eb39c Compare March 28, 2023 09:01
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Islam :)

@@ -302,7 +303,16 @@ func (n *dagScanNode) dagBlockToNodeDoc(block blocks.Block) (core.Doc, []*ipld.L
if err != nil {
return core.Doc{}, nil, err
}
n.commitSelect.DocumentMapping.SetFirstOfName(&commit, "dockey", dockeyObj.DocKey)
n.commitSelect.DocumentMapping.SetFirstOfName(&commit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think people here do prefer the below, instead of the current (no need to change now, but in future PRs consider this):

n.commitSelect.DocumentMapping.SetFirstOfName(
    &commit,
    request.DockeyFieldName, 
    dockeyObj.DocKey,
)

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for accommodating the abrupt requirements change.

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@islamaliev islamaliev merged commit c9ef176 into develop Mar 28, 2023
@islamaliev islamaliev deleted the islam/feat/I849-add-collectionid-to-commit branch March 28, 2023 19:30
@islamaliev islamaliev self-assigned this Apr 3, 2023
shahzadlone pushed a commit that referenced this pull request Apr 13, 2023
Add collectionID field to commit 
Commits can be grouped and ordered by collectionID

To retrieve a value for collectionID the method 
GetCollectionByVersionID is added to db interface.
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
Add collectionID field to commit 
Commits can be grouped and ordered by collectionID

To retrieve a value for collectionID the method 
GetCollectionByVersionID is added to db interface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Commits: Add collection id field to commits fields
6 participants