feat: Allow new fields to be added locally to schema #1139

AndrewSisley · 2023-02-20T18:56:12Z

Relevant issue(s)

Resolves #1004

Description

Allows new simple (non-relational) fields to be added locally to schema.

The following items are out of scope of this PR and will be done later:

GQL Introspection tests
P2P tests on updated schema
Hiding the Field indexes
Reworking the client interfaces to provide a much cleaner experience regarding functions exposing transactions
Hiding the FieldDescription.Kind number value
Hiding the FieldDescription.Typ (CRDT) number value and renaming that field
Concurrency tests
Wrapping/hiding json patch lib errors (e.g. add operation does not apply: doc is missing path)

I very strongly recommend reviewing commit by commit, there are (hopefully) useful explanations as to what has been done in each commit body. The feature itself has been introduced in the final commit, everything else is preparation for that feature.

codecov · 2023-02-20T20:32:11Z

Codecov Report

Merging #1139 (27093dd) into develop (92a7f89) will increase coverage by 0.05%.
The diff coverage is 69.76%.

@@             Coverage Diff             @@
##           develop    #1139      +/-   ##
===========================================
+ Coverage    68.23%   68.28%   +0.05%     
===========================================
  Files          181      181              
  Lines        16617    17007     +390     
===========================================
+ Hits         11338    11613     +275     
- Misses        4337     4426      +89     
- Partials       942      968      +26

Impacted Files	Coverage Δ
request/graphql/schema/generate.go	`84.26% <ø> (+0.33%)`	⬆️
net/peer.go	`43.90% <21.56%> (-2.15%)`	⬇️
net/server.go	`59.87% <25.00%> (ø)`
db/collection_update.go	`72.17% <50.00%> (ø)`
db/db.go	`70.24% <50.00%> (-2.73%)`	⬇️
db/p2p_collection.go	`55.55% <57.14%> (-4.45%)`	⬇️
db/schema.go	`64.51% <67.53%> (+18.68%)`	⬆️
db/collection.go	`67.65% <75.64%> (+1.45%)`	⬆️
request/graphql/parser.go	`87.14% <78.57%> (-2.69%)`	⬇️
db/sequence.go	`64.28% <90.00%> (-0.84%)`	⬇️
... and 10 more

fredcarle

I did a quick overview but didn't spend enough time on the details to approve it and I'm not sure how much time I'll have to spend on this in the coming days. As far as I can tell though, it looks good.

fredcarle · 2023-03-01T18:06:27Z

client/db.go

+	// collection. Will return false and an error if it fails validation.
+	ValidateUpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (bool, error)


suggestion: "Will return false and and error" is a bit misleading here as it the doc string makes it sound as if we either get true and no error or false and error. But looking at the function I see that it can also return false and no error. To avoid that confusion I would simply say "Will return an error if it fails validation".

agreed, cheers Fred - I'll change this.

reword bool return documentation

Does this need to be publicly exposed on the client interface? Cant it just be privatee within the UpdateCollection? Whats the benefit of publicly calling Validate then Update?

Seems like its only used internally anyway?

Whats the benefit of publicly calling Validate then Update?

It is quite common to want to validate something without immediate execution, and many places consider it best practice to expose validation functions by default. I'd probably want to use this as an app dev, and it seems reasonably likely that there'd be others too.

EDIT: Just noting that Orpheus also wanted this via the CLI and there is an open ticket for it.

In that case, should this not be done from the perspective of the Patch DDL then for external use? At the moment, they would need to somehow (not that hard since its a patch, but we are delegating that responsibilty to the dev in this API) produce the new description objects, and then call this API.

Compared to calling something like db.ValidateUpdatePatch(ctx, patch). Which would essentially apply the patch in a temp object and call validateDescription anyway.

Putting this API along the Schema related APIs instead of the Collection APIs.

We'd want both, as updates can be either a json patch or collectionDescription update. This (the collectionDescription varient) already existed as a function so adding it to the interface was all the effort required.

Adding a ValidatePatch function would be slightly more effort and a scope expansion (ValidateUpdateCollectionTxn is already tested via UpdateCollectionTxn, whereas ValidatePatch would be untested).

Do you really dislike the adding of just this one now?

fredcarle · 2023-03-01T18:08:51Z

db/collection.go

+// ValidateUpdateCollectionTxn validates that the given collection description is a valid update.
+//
+// Will return true if the given desctiption differs from the current persisted state of the
+// collection. Will return false and an error if it fails validation.


suggestion: Same as above :)

islamaliev · 2023-03-02T16:52:27Z

tests/integration/schema/updates/add/field/crdt/composite_test.go

+						{ "op": "add", "path": "/Users/Schema/Fields/-", "value": {"Name": "Foo", "Kind": 2, "Typ":3} }
+					]
+				`,
+				ExpectedError: "only default or LWW (last writer wins) CRDT types are supported. Name: Foo, CRDTType: 3",


using exact string to assert certain behavior is in general a bad practice.
Is it our code convention or more like lets-stick-to-it-for-now approach?

Here I wish to match the whole string, as it is all relevant to whether the error is the expected.

ExpectedError does work on partials, and we use it that way, but here I do want the whole string.

in this case the problem is not with entire vs partial string matching but with string matching in general.
The test should assert that certain error occurred and the test should fail only if the no error occurred or if some other error occurred instead of expected one.
With current change the test might fail for absolutely unrelated changes like text message formatting.
For this we will have to change we go about error handling.
But I guess this is a discussion for a bigger round.

Ah, you mean protection against line breaks etc?

I think that would be a really easy change - should just be a couple of lines in tests/integration/utils2.go (~ln 848).

Shouldn't matter in the short term as the error text/formatting are managed by devs atm and if they are deliberately changing the formatting they can also deliberately change the expected error (or just make that tweak to errors.Is).

tests/integration/schema/updates/add/field/crdt/invalid_test.go

tests/integration/schema/updates/add/field/crdt/object_bool_test.go

islamaliev · 2023-03-02T17:16:50Z

tests/integration/schema/updates/add/field/create_test.go

+					{
+						"_key":  "bae-43deba43-f2bc-59f4-9056-fef661b22832",
+						"Name":  "John",
+						"Email": nil,


looks like this test consumed the previous one.

islamaliev · 2023-03-02T17:19:18Z

tests/integration/schema/updates/add/field/create_update_test.go

+
+func TestSchemaUpdatesAddFieldWithCreateWithUpdateAfterSchemaUpdateAndVersionJoin(t *testing.T) {
+	initialSchemaVersionId := "bafkreicg3xcpjlt3ecguykpcjrdx5ogi4n7cq2fultyr6vippqdxnrny3u"
+	updatedSchemaVersionId := "bafkreicnj2kiq6vqxozxnhrc4mlbkdp5rr44awaetn5x5hcdymk6lxrxdy"


why not const?

why bother with the extra word?

it's semantics. If it const I, the reader of the code, will note straight away that the values aren't supposed to change. Otherwise I will have to go through the code to see why it is a variable.

:) I dont think that is something to worry about in our tests, the possiblities for mutation are quite small here and if anyone found a reason to mutate anything in a test declaration then that would probs/hopefully be flagged and removed in review anyway.

If you feel strongly about this, I'd suggest we add a linter (outside this PR). In most other langs I'd consider it a no-brainer, as it is a conscious choice between var and const, however Go's := makes it a bit awkward and a poor use of time and mental energy.

islamaliev · 2023-03-02T17:24:16Z

tests/integration/schema/updates/add/field/kind/bool_array_test.go

+				Request: `query {
+					Users {
+						Name
+						Foo


I don't see anything bool-array-related here

Kind: 3 represents a field of type boolean array.

We plan on allowing string based defining of these values in the near future, and when the happens we'll get tests that use the more descriptive strings, but for the short term we (and any users) will need to deal with the uint8 representations.

tests/integration/schema/updates/add/field/kind/bool_test.go

islamaliev · 2023-03-02T17:46:17Z

tests/integration/schema/updates/add/field/kind/datetime_test.go

+					type Users {
+						Name: String
+					}
+				`,


these schema is repeated several times and some other things. Why not to extract to helper functions?
The test could look like this:

test := testUtils.TestCase{ Description: "Test schema update, add field with kind datetime (10)", Actions: []any{ testUtils.SchemaUpdate{ Schema: getTestSchema(), }, testUtils.SchemaPatch{ Patch: getTestPatchWithKind(10), }, testUtils.Request{ Request: getTestQueryWithFoo(), Results: []map[string]any{}, }, }, } testUtils.ExecuteTestCase(t, []string{"Users"}, test)

or even like this:

test := testUtils.TestCase{ Description: "Test schema update, add field with kind datetime (10)", Actions: []any{ getTestSchema(), getTestPatchWithKind(10), testUtils.Request{ Request: getTestQueryWithFoo(), Results: []map[string]any{}, }, }, } testUtils.ExecuteTestCase(t, []string{"Users"}, test)

I have quite a strong preference for not using helps like that in tests, whilst they reduce the number of lines they often obscure what is under test.

EDIT: To Expand: Here I want to see the actions that will be used by the users. Not only do I think it makes it much easier to see what is actually being done in the test, I have a much better idea as to how it might look and feel to the users. It also reduces the complexity of the test, something I consider particularly important for part-timers/newcomers to the test - including Defra users - the documentation aspect of these tests is very important to me, not just what the host machine executes.

EDIT2: Helper functions and over-sharing of code often results in tests that do far more than they need to, further obscuring what is actually under test. For example we currently do this a lot with test schema - most of the fields are not relevant to most of the tests - it is a tolerable trade-off for just schema perhaps, but I really don't want it spreading (and e.g. standardising test queries so they return 10 fields when each test only cares about 1).

I see your points. They are absolutely valid concerns. And, as you mentioned earlier in one of the comments, there are always trade-offs.

I just would like to bring up some other points:

One of the (often underrated) properties of tests is low-level documentation. Which means that in order to understand how a system under test works one should just read the tests. And to do this effectively the documentation should be as concise and informative as possible. For tests to be concise they should contain only information relevant to the test itself. All the boiler-plate should be hidden away in some test set up stage.
In our specific case there are many repetitive parts that are not necessarily related to the test, hence the tests look very similar. As a result, it takes some effort and mental text-diff skills to spot what exactly makes one test special in comparison to others.

I think your approach would fit more to systems with less complexity and therefore fewer text cases.

But maybe it's more general discussion.

islamaliev · 2023-03-02T17:50:38Z

tests/integration/schema/updates/add/field/kind/dockey_test.go

+			},
+		},
+	}
+	testUtils.ExecuteTestCase(t, []string{"Users"}, test)


again the same test. Looks like these many of them here.

islamaliev · 2023-03-02T17:53:39Z

tests/integration/schema/updates/add/field/kind/int_nil_array_test.go

+			testUtils.SchemaPatch{
+				Patch: `
+					[
+						{ "op": "add", "path": "/Users/Schema/Fields/-", "value": {"Name": "Foo", "Kind": 19} }


why not to use const or enum for Kind values?

As in fmt.Sprintf them in (the enum does already exist in the prod code)?

The users can supply raw 19s, so that is what we should test, hiding it behind a test const hides the hides the ugliness of the production interface, and using the production const changes the concept under test (as well as hiding the ugliness).

shahzadlone · 2023-03-03T15:45:42Z

core/parser.go

@@ -49,5 +50,5 @@ type Parser interface {
 	ParseSDL(ctx context.Context, schemaString string) ([]client.CollectionDescription, error)

 	// Adds the given schema to this parser's model.


question: Is the documentation still valid for SetSchema?

yes, what it does conceptually has not changed. The only difference is that changes will only be applied if/until the txn is successfully committed.

The doc could probably be expanded to mention this (although it is somewhat a universal implicit-given for anything transaction-based), but it is internal, not public, and if left up to me laziness would probably win out. Do you want it expanded?

shahzadlone · 2023-03-03T16:39:12Z

db/errors.go

+	errCollectionIDDoesntMatch       string = "CollectionID does not match existing"
+	errSchemaIDDoesntMatch           string = "SchemaID does not match existing"


question: Why are some error messages starting with a capitalized char

These are property names, the name of the property is CollectionID, collectionID does not exist.

The alternative would be to describe the property instead of naming it collection ID does not match existing

jsimnz

Couple of little things. Overall I like the direction and functionality.

jsimnz · 2023-03-03T21:58:34Z

client/db.go

+	// collection. Will return false and an error if it fails validation.
+	ValidateUpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (bool, error)


Does this need to be publicly exposed on the client interface? Cant it just be privatee within the UpdateCollection? Whats the benefit of publicly calling Validate then Update?

jsimnz · 2023-03-03T22:04:42Z

client/db.go

+	// collection. Will return false and an error if it fails validation.
+	ValidateUpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (bool, error)


Seems like its only used internally anyway?

jsimnz · 2023-03-06T09:52:05Z

db/collection.go

+	var hasChanged bool
+	existingCollection, err := db.GetCollectionByNameTxn(ctx, txn, proposedDesc.Name)
+	if err != nil {
+		if err.Error() == "datastore: key not found" {


todo: Similar to standup convo on tests, but lets avoid doing direct string comparison. This particular error comes from ds.ErrNotFound.

😆 Cheers lol - I spent a while looking for that err const as I thought it existed but never found it.

replace with ds.ErrNotFound

jsimnz · 2023-03-06T10:08:48Z

db/collection.go

+	err = txn.Commit(ctx)
+	return col, err


nitpick: this can be merge into a single line

not sure how I feel about that, it reduces the importance of the Commit call making it appear as something of an afterthought. May or may not change

jsimnz · 2023-03-06T10:09:23Z

db/collection.go

+	err = txn.Commit(ctx)
+	return col, err


nitpick: merge lines

jsimnz · 2023-03-06T10:10:53Z

db/db.go

@@ -161,32 +161,46 @@ func (db *db) initialize(ctx context.Context) error {
 	db.glock.Lock()
 	defer db.glock.Unlock()

+	txn, err := db.NewTxn(ctx, false)


question: Does initialize really need a txn. Not against it, but its run one on startup before anything else. Just a thought.

This was documented in the commit message:

DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init.

Thanks for pointing out. Realized things afterwards as well 👍

jsimnz · 2023-03-06T10:19:18Z

db/schema.go

+// The collections (including the schema version ID) will only be updated if any changes have actually
+// been made, if the net result of the patch matches the current persisted description then no changes
+// will be applied.
+func (db *db) PatchSchema(ctx context.Context, patchString string) error {


question: Obviously we have a transaction in place to handle the PatchSchem call. But should there be any additional locks for safety. eg a RWLock that is read locked on basically all ops but write locked on PatchSchema acting as a schema lock.

What safety do you think you would gain?

A more explicit semantic that is enforced beyond the the transactions. Makes it clear that schema updates are clearly intended to be a protected action that locks the DB until completed.

Since at the moment, other transactions can run along side the schema update. The only thing the transaction protects us from shared keysets, which is unlikely.

Most databases don't support "online" or "lock-free" schema updates as theyre rather complex to get right. We're not going to lose anything by adding a lock as its not intended to be a highly concurrent/perf sensitive action. And it gives us a lot of safety regarding the semantics of other ongoing transactions.

I know we descoped the concurrent tests, but this feels independant of those tests, since its a global lock. Overall it provides much clearer and stronger semantics and gurantees about the nature of ongoing events.

Especially in lie of concurrency tests tbh. This would ensure that theres basically no concurrency tests to worry about.

We can look at online/lock-free schema updates in the future without breaking anything.

This would ensure that theres basically no concurrency tests to worry about.

Adding a global lock wont change the testing required. The tests will not be designed around the current implementation, only the public behaviour/surface-area.

Most databases don't support "online" or "lock-free" schema updates as theyre rather complex to get right.

I am very confident that the SQLs that I have worked with do not lock the entire database on add of a single field. At least some of them lock the full table on create of an index, but I think there are very very few operations that lock the entire database.

I can add a lock here if you really want, but I do see it as an ugly, untested (i.e dead code), bottleneck that should be removed as soon as we have the bandwidth+tests.

Actually, a lock within this function would offer very little/no protection at all. If stale data is able to make it through past the txn, then a lock here will do nothing to help.

I can see how our setup could keep things safe, as we load the collection description from the system store on queries, which would be tracked for conflictions that a schema update would trigger.

I believe we talked about it else where, but we wanted to verify schemaVersionID on transactions (data reads/writes) correct?

but we wanted to verify schemaVersionID on transactions (data reads/writes) correct?

That is a different issue, related to the use of stale client.Collections for doc writes. Not relevant here.

I can't tell what you are asking for in your other comment starting with Yes, my reference to.... A mutex in this location will add no/almost-no protection against concurrent schema update clashes, any mutex based protect needs to align with the transaction scope - wrapping this function behind a mutex will do pretty much nothing as it still leaves massive room for state data to accumulate - if the transaction itself does not catch everything.

The potential flow would be

// col will reference CollectionDescription with schema v1 col := db.GetCollectionByName("users") // then another request (another goroutine for example) // now users description has schema v2 db.UpdateSchema(desc) // users desc // col variable still holding old description col.UpdateWithkey(key, update)

So here the update is going to apply things relative to schema v1, even tho there is now schema v2.

My suggested lock won't help this situation regardless, which is why I asked about comparing/verifying schemVersionIDs as previously discussed (can't remember when).

tldr; im OK dropping the suggested lock as described in this thread if we are comparing/validating schemaVersionIDs on the transaction. This would A) Manaully enforce a check B) Cause the schemaVersionID key to be read from the transaction, and ensure we protect against stale (inconsistent) data.

That is a different issue, related to the use of stale client.Collections for doc writes. Not relevant here.

fairly relevant imo. My initial hope for the lock is to prevent the stale accumulation of of CollectionDescription which has the potential to get by the transaction semantics.

I realize now as noted that the lock won't actually give us my intended desire, but verifying the schemaVersionID would, and I don't believe its implemented in this PR correct?

jsimnz · 2023-03-06T10:22:19Z

client/db.go

+	// The collection (including the schema version ID) will only be updated if any changes have actually
+	// been made, if the given description matches the current persisted description then no changes will be
+	// applied.
+	UpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (Collection, error)


suggestion: UpdateCollection should take a name (string) as a parameter to make it clear what is the intended "collection" youre trying to update.

Obviously the description itself has the name too, but it should be more explicit from an API perspective.

I disagree with this quite strongly and it creates additional points of failure and confusion where the name parameter does not pair up with the description. Such a param can also be added later if we feel the need.

Not really convinced by that argument, but looking at the current system, CreateCollection also doesn't take a named value, so its good for now 👍.

jsimnz · 2023-03-06T10:31:26Z

request/graphql/parser.go

+
+	txn.OnSuccess(
+		func() {
+			p.schemaManager = schemaManager


question: I wondering if theres any race condition potential here?

The transaction should protect against any race I think, and I do not wish to go down some kind of static analysis hellhole when the concurrency tests that would test stuff like this were pushed out of scope. If those tests are not worth doing down, spending significant amounts of time staring at this line for the sake of potential concurrency bugs are also out of scope.

Thats fair, we did descope the concurrency tests. It just jumped out at me incase there was an assumption that the txn protects thing here, but the hook functions don't necessarily inherit that semantic. Good for now.

jsimnz

LGTM - excited for this to land

Spotted and quickly corrected. Has nothing to do with this PR though, but is small enough to include.

It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.

They are not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.

I'm not sure any more than this is really needed, especially given that we will largely be hiding this from users shortly after merge.

Previously this would partially succede - the gql types would be updated, but the saving of the collection descriptions would fail (as there is another uniqueness check there), leaving the database in an invalid state until restart.

Was incorrectly quering the version history, returning a collection per version, instead of just the current version. Went unnoticed as previously each collection could only have a single version.

Collection stuff needs to be protected by transactions. Partial success of either a create or a mutate cannot be permitted and the use of transactions protects against this. The transactions are also needed to protect against the use of stale data, by including the collection in the transaction used for P2P and query/planner stuff we should ensure that stuff is done against a single, complete version of the collection, and that it is not possible for the collection to mutate whilst something is using the collection. At the moment such concurrent use should now result in a transaction conflict error - this is not ideal and the logic here should probably grow to permit the queuing of such clashes (e.g. through use of a mutex) instead of making the users retry until it succedes - such a change will very likely need to be done within the scope declared by the transaction anyway, so I see no wasted code/time by the changes in this commit here - it is a start, and prevents odd/damaging stuff from happening. DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init.

Renames and changes AddSchema to SetSchema. SetSchema is now transactional, GQL type changes will now only be 'commited' on transaction commit, whilst allowing SetSchema to be called safely at any point during the lifetime of the transaction - allowing for the schema to be validated against GQL constraints before any changes have been persisted else where, and allowing those other changes to be executed/validated before any changes have been made to the GQL types.

Please note that the client interfaces will be reworked in the near future so that the transaction related items clutter the primary interface less, and are more consistent. For now I have just followed the existing `Txn` suffix naming.

shahzadlone

Thanks for answering the questions, sorry for not being as involved in this review (did follow the other conversations). LGTM overall

* Correct P2P error message Spotted and quickly corrected. Has nothing to do with this PR though, but is small enough to include. * Remove field kind decimal It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field kind bytes It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field embedded object kinds They are not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Add documentation to FieldDesc.Kind I'm not sure any more than this is really needed, especially given that we will largely be hiding this from users shortly after merge. * Assert for collection name uniquenss within SDL Previously this would partially succede - the gql types would be updated, but the saving of the collection descriptions would fail (as there is another uniqueness check there), leaving the database in an invalid state until restart. * Correctly query all existing collections Was incorrectly quering the version history, returning a collection per version, instead of just the current version. Went unnoticed as previously each collection could only have a single version. * Make collection persistance transactional Collection stuff needs to be protected by transactions. Partial success of either a create or a mutate cannot be permitted and the use of transactions protects against this. The transactions are also needed to protect against the use of stale data, by including the collection in the transaction used for P2P and query/planner stuff we should ensure that stuff is done against a single, complete version of the collection, and that it is not possible for the collection to mutate whilst something is using the collection. At the moment such concurrent use should now result in a transaction conflict error - this is not ideal and the logic here should probably grow to permit the queuing of such clashes (e.g. through use of a mutex) instead of making the users retry until it succedes - such a change will very likely need to be done within the scope declared by the transaction anyway, so I see no wasted code/time by the changes in this commit here - it is a start, and prevents odd/damaging stuff from happening. DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init. * Make SetSchema (GQL) transactional Renames and changes AddSchema to SetSchema. SetSchema is now transactional, GQL type changes will now only be 'commited' on transaction commit, whilst allowing SetSchema to be called safely at any point during the lifetime of the transaction - allowing for the schema to be validated against GQL constraints before any changes have been persisted else where, and allowing those other changes to be executed/validated before any changes have been made to the GQL types. * Add support for adding fields to schema Please note that the client interfaces will be reworked in the near future so that the transaction related items clutter the primary interface less, and are more consistent. For now I have just followed the existing `Txn` suffix naming.

) * Correct P2P error message Spotted and quickly corrected. Has nothing to do with this PR though, but is small enough to include. * Remove field kind decimal It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field kind bytes It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field embedded object kinds They are not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Add documentation to FieldDesc.Kind I'm not sure any more than this is really needed, especially given that we will largely be hiding this from users shortly after merge. * Assert for collection name uniquenss within SDL Previously this would partially succede - the gql types would be updated, but the saving of the collection descriptions would fail (as there is another uniqueness check there), leaving the database in an invalid state until restart. * Correctly query all existing collections Was incorrectly quering the version history, returning a collection per version, instead of just the current version. Went unnoticed as previously each collection could only have a single version. * Make collection persistance transactional Collection stuff needs to be protected by transactions. Partial success of either a create or a mutate cannot be permitted and the use of transactions protects against this. The transactions are also needed to protect against the use of stale data, by including the collection in the transaction used for P2P and query/planner stuff we should ensure that stuff is done against a single, complete version of the collection, and that it is not possible for the collection to mutate whilst something is using the collection. At the moment such concurrent use should now result in a transaction conflict error - this is not ideal and the logic here should probably grow to permit the queuing of such clashes (e.g. through use of a mutex) instead of making the users retry until it succedes - such a change will very likely need to be done within the scope declared by the transaction anyway, so I see no wasted code/time by the changes in this commit here - it is a start, and prevents odd/damaging stuff from happening. DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init. * Make SetSchema (GQL) transactional Renames and changes AddSchema to SetSchema. SetSchema is now transactional, GQL type changes will now only be 'commited' on transaction commit, whilst allowing SetSchema to be called safely at any point during the lifetime of the transaction - allowing for the schema to be validated against GQL constraints before any changes have been persisted else where, and allowing those other changes to be executed/validated before any changes have been made to the GQL types. * Add support for adding fields to schema Please note that the client interfaces will be reworked in the near future so that the transaction related items clutter the primary interface less, and are more consistent. For now I have just followed the existing `Txn` suffix naming.

AndrewSisley added feature New feature or request area/schema Related to the schema system area/collections Related to the collections system action/no-benchmark Skips the action that runs the benchmark. labels Feb 20, 2023

AndrewSisley added this to the DefraDB v0.5 milestone Feb 20, 2023

AndrewSisley requested a review from a team February 20, 2023 18:56

AndrewSisley self-assigned this Feb 20, 2023

AndrewSisley force-pushed the sisley/feat/I1004-add-fields-to-schema branch 4 times, most recently from f0f552c to 05a2f0e Compare February 20, 2023 20:25

AndrewSisley force-pushed the sisley/feat/I1004-add-fields-to-schema branch 18 times, most recently from 075b49f to dc79dc9 Compare February 20, 2023 21:47

AndrewSisley force-pushed the sisley/feat/I1004-add-fields-to-schema branch 3 times, most recently from 5c7ddef to 319aa94 Compare February 24, 2023 21:45

AndrewSisley marked this pull request as ready for review February 24, 2023 22:12

AndrewSisley requested review from jsimnz and fredcarle February 24, 2023 22:14

fredcarle reviewed Mar 1, 2023

View reviewed changes

islamaliev reviewed Mar 2, 2023

View reviewed changes

shahzadlone reviewed Mar 3, 2023

View reviewed changes

jsimnz requested changes Mar 6, 2023

View reviewed changes

AndrewSisley requested a review from jsimnz March 6, 2023 16:09

jsimnz approved these changes Mar 6, 2023

View reviewed changes

AndrewSisley added 10 commits March 6, 2023 17:49

Correct P2P error message

eb7c28b

Spotted and quickly corrected. Has nothing to do with this PR though, but is small enough to include.

Remove field kind decimal

e71a52d

It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.

Remove field kind bytes

5eb5495

It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.

Remove field embedded object kinds

4377581

They are not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.

Add documentation to FieldDesc.Kind

05d5078

I'm not sure any more than this is really needed, especially given that we will largely be hiding this from users shortly after merge.

Assert for collection name uniquenss within SDL

362d204

Previously this would partially succede - the gql types would be updated, but the saving of the collection descriptions would fail (as there is another uniqueness check there), leaving the database in an invalid state until restart.

Correctly query all existing collections

7f6f92d

Was incorrectly quering the version history, returning a collection per version, instead of just the current version. Went unnoticed as previously each collection could only have a single version.

Add support for adding fields to schema

12fb03e

Please note that the client interfaces will be reworked in the near future so that the transaction related items clutter the primary interface less, and are more consistent. For now I have just followed the existing `Txn` suffix naming.

AndrewSisley force-pushed the sisley/feat/I1004-add-fields-to-schema branch from 319aa94 to 12fb03e Compare March 6, 2023 22:49

AndrewSisley added 2 commits March 6, 2023 17:52

PR FIXUP - Reword ValidateUpdateCollectionTxn docs

adf435b

PR FIXUP - errors.Is and ds.ErrNotFound for err-if

27093dd

shahzadlone approved these changes Mar 6, 2023

View reviewed changes

AndrewSisley merged commit 799a242 into develop Mar 6, 2023

AndrewSisley deleted the sisley/feat/I1004-add-fields-to-schema branch March 6, 2023 23:26

		// collection. Will return false and an error if it fails validation.
		ValidateUpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (bool, error)

		@@ -49,5 +50,5 @@ type Parser interface {
		ParseSDL(ctx context.Context, schemaString string) ([]client.CollectionDescription, error)

		// Adds the given schema to this parser's model.

		errCollectionIDDoesntMatch string = "CollectionID does not match existing"
		errSchemaIDDoesntMatch string = "SchemaID does not match existing"

feat: Allow new fields to be added locally to schema #1139

feat: Allow new fields to be added locally to schema #1139

Conversation

AndrewSisley commented Feb 20, 2023 • edited Loading

Relevant issue(s)

Description

codecov bot commented Feb 20, 2023 • edited Loading

Codecov Report

fredcarle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Mar 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Mar 2, 2023 • edited Loading

Choose a reason for hiding this comment

islamaliev Mar 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Mar 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsimnz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsimnz Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsimnz left a comment

Choose a reason for hiding this comment

AndrewSisley commented Feb 20, 2023 •

edited

Loading

codecov bot commented Feb 20, 2023 •

edited

Loading

AndrewSisley Mar 1, 2023 •

edited

Loading

AndrewSisley Mar 6, 2023 •

edited

Loading

AndrewSisley Mar 2, 2023 •

edited

Loading

islamaliev Mar 3, 2023 •

edited

Loading

AndrewSisley Mar 3, 2023 •

edited

Loading

AndrewSisley Mar 6, 2023 •

edited

Loading

jsimnz Mar 6, 2023 •

edited

Loading