refactor: Use fastjson to parse mutation data string #772

fredcarle · 2022-09-02T06:11:31Z

Relevant issue(s)

Resolves #124

Description

This PR makes use of the fastjson package to improve mutation data string parsing. Doing so also simplifies the validateFieldSchema function.

Tasks

I made sure the code is well commented, particularly hard-to-understand areas.
I made sure the repository-held documentation is changed accordingly.
I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

Using integration tests

Specify the platform(s) on which this was tested:

MacOS

codecov · 2022-09-02T06:19:34Z

Codecov Report

Merging #772 (f0275c5) into develop (e96c1c9) will increase coverage by 0.41%.
The diff coverage is 62.37%.

@@             Coverage Diff             @@
##           develop     #772      +/-   ##
===========================================
+ Coverage    58.70%   59.11%   +0.41%     
===========================================
  Files          153      153              
  Lines        17083    16977     -106     
===========================================
+ Hits         10028    10036       +8     
+ Misses        6121     6023      -98     
+ Partials       934      918      -16

Impacted Files	Coverage Δ
db/collection_update.go	`55.47% <62.37%> (+12.48%)`	⬆️

AndrewSisley

Looks good, code is quite a bit nice than it was before :) Did you benchmark it at all? Got a few minor comments for you.

AndrewSisley · 2022-09-02T13:33:45Z

client/collection.go

 	// UpdateWithKeys updates documents matching the given DocKeys.
 	//
 	// The provided updater must be a string Patch, string Merge Patch, a parsed Patch, or parsed Merge Patch
 	// else an ErrInvalidUpdater will be returned.
 	//
 	// Returns an ErrDocumentNotFound if a document is not found for any given DocKey.
-	UpdateWithKeys(context.Context, []DocKey, interface{}) (*UpdateResult, error)
+	UpdateWithKeys(context.Context, []DocKey, string) (*UpdateResult, error)


praise: Thanks for this type change - this always confused me and I thought there was more types that this could handle but the diff suggests it was just the two. Do check in with John though if you haven't already to make sure we are not losing anything.

Thanks :)

As far as I could tell the value passed was always a string but yes I will be checking with John.

The idea was to be able to support native Go types map[string]interface{} If using the programatic approach, but at the moment, its only used/tested from the POV of the query system, which only handles strings.

I would nice to still support this, but its prob lower priority, and supporting native types gets in the way of using the fastjson approach outlined in this PR anyway. So its fine for now.

Have a look at the last commit for a potential solution to keep Go maps as a possible updater

AndrewSisley · 2022-09-02T13:38:15Z

db/collection_update.go

 ) error {
 	keyStr, ok := doc["_key"].(string)
 	if !ok {
-		return errors.New("Document is missing key")
+		return errors.New("document is missing key")


thought: Making these changes within commits that contain more significant changes refactors/feat changes does make the review harder, adding noise that makes it harder to focus on stuff that needs attention. Would suggest doing cosmetic stuff like this in separate commits in the future

AndrewSisley · 2022-09-02T13:43:13Z

db/collection_update.go

-		if _, ok := mval.(map[string]interface{}); ok {
+
+	mergeMap := make(map[string]*fastjson.Value)
+	merge.Visit(func(k []byte, v *fastjson.Value) {


suggestion: Why bother with the extra variable/iteration and move the contents of the for loop into this Visit function? Feels a bit odd to have both

Because I can't return an error from the anonymous function.

You could assign the error to a variable defined outside of the anon-func though, and return that? Then you can stick to a single iteration

The problem with that is that if we have say 20 fields and we have an error on the first one, there is no way of stopping the Visit method from iterating over all the other fields.

ah true, it wont break :) Thanks

AndrewSisley · 2022-09-02T13:47:41Z

db/collection_update.go

-		}
-
-		val := client.NewCBORValue(fd.Typ, cval)
+		val := client.NewCBORValue(fd.Typ, mergeCBOR[mfield])


suggestion: This feels odd to me and I think the code would be clearer and slightly safer and efficient if you just use the object to created with validateFieldSchema instead of fetching it from the map. Fetching it suggests that it might be something else.

It's assigned directly to the map hence why it's then fetching from it.

yes, but instead of doing:

foo := bar() map[x] = foo y := map[x] foobar(y)

you could just do

foo := bar() map[x] = foo foobar(foo)

This way there is no ambiguity over what is being passed into fooBar, and the developer reading and the CPU has to do slightly less work.

I would think that

map[x] = bar() foobar(map[x])

is actually less CPU intensive. I also find very little ambiguity here.

Performance is secondary here, but you are comparing the cost of fetching an item from a map to the cost of doing nothing.

Similar thought process regarding readability. Only when reading the developer has to also wonder why to would insert something in to a map, and then immediately try and find it again - suggesting that it might have changed between the two lines.

It seam that compiler efficiency would lead to the

foo := bar() map[x] = foo foobar(foo)

being the better option. https://godbolt.org/z/cqEYoWWxc

I disagree that it improves readability as there is only an error check of 3 lines between the assignment and where it is used.

3 lines (now) and a double take to make sure it is not shared between threads etc. or mutated within any funcs that may be called within those 3 lines.

Is just an odd thing to see IMO. Leave it in if you want though.

AndrewSisley · 2022-09-02T13:54:00Z

db/collection_update.go

-func convertNillableArrayWithConverter[TIn any, TOut any](val any, converter func(TIn) TOut) ([]*TOut, error) {
-	if val == nil {
-		return nil, nil
+	if zeroValue == nil {


suggestion: This is bit odd, it looks like you have two functions in here not one :) Suggest using two functions instead of an extra param that is unused for half the callers and a big if block.

not sure how you see that it's unused. It's used for all of them.

If zeroValue is nil, it is not used. The param is used to determine if function A or function B is used in that instance, and it is a compile-time constant here - used to toggle a code branch at runtime.

zeroValue == nil is known at compile time, but you are making it appear (to both developer and CPU) that it is a runtime thing, making the code-flow more complicated than it needs to be.

AndrewSisley · 2022-09-02T13:56:56Z

db/collection_update.go

+func getArray[T any](
+	val *fastjson.Value,
+	typeGetter func(*fastjson.Value) (T, error),
+	zeroValue any,


todo: Why is zeroValue of type any, and not of type T? I can't spot the reason for that, and if you use type T you remove the potential runtime failure that you are having to check for in the body of the function.

Because zeroValue cannot be nil it type is T

Do you mean for the purpose of the big-if I referred to in another comment? That feels like an abuse of the type system if the zero value of T is not of type T

The issue is the zero value of T cannot be nil. So we can either do:

func getArray[T any]( val *fastjson.Value, typeGetter func(*fastjson.Value) (T, error), zeroValue T, ) (any, error) { if val.Type() == fastjson.TypeNull { return nil, nil } valArray, err := val.Array() if err != nil { return nil, err } arr := make([]T, len(valArray)) for i, arrItem := range valArray { if arrItem.Type() == fastjson.TypeNull { arr[i] = zeroValue continue } arr[i], err = typeGetter(arrItem) if err != nil { return nil, err } } return arr, nil } func getNillableArray[T any]( val *fastjson.Value, typeGetter func(*fastjson.Value) (T, error), ) ([]*T, error) { if val.Type() == fastjson.TypeNull { return nil, nil } valArray, err := val.Array() if err != nil { return nil, err } arr := make([]*T, len(valArray)) for i, arrItem := range valArray { if arrItem.Type() == fastjson.TypeNull { arr[i] = nil continue } v, err := typeGetter(arrItem) if err != nil { return nil, err } arr[i] = &v } return arr, nil }

or keep it the way I currently have it:

func getArray[T any]( val *fastjson.Value, typeGetter func(*fastjson.Value) (T, error), zeroValue any, ) (any, error) { if val.Type() == fastjson.TypeNull { return nil, nil } valArray, err := val.Array() if err != nil { return nil, err } if zeroValue == nil { arr := make([]*T, len(valArray)) for i, arrItem := range valArray { if arrItem.Type() == fastjson.TypeNull { arr[i] = nil continue } v, err := typeGetter(arrItem) if err != nil { return nil, err } arr[i] = &v } return arr, nil } arr := make([]T, len(valArray)) for i, arrItem := range valArray { if arrItem.Type() == fastjson.TypeNull { var ok bool arr[i], ok = zeroValue.(T) if !ok { return nil, errors.New("zeroValue should be of the same type as the array items type") } continue } arr[i], err = typeGetter(arrItem) if err != nil { return nil, err } } return arr, nil }

You don't need to set arr[i] = zeroValue in getArray, you just continue - no zeroValue required (leave as array default)

True. Unless we want the option to set a different default value.

So you prefer the two function approach?

For sure, I see the current as two functions pretending to be one - the main function body is split into two, and only one of those two will execute depending on an input parameter that is (currently) always a compile-time constant.

AndrewSisley · 2022-09-02T13:59:04Z

db/collection_update.go

+
+func getBool(v *fastjson.Value) (bool, error) {
+	b, err := v.Bool()
+	return b, err


question: I'm not nitpicking here, just curious - why are you doing this and not just return v.Bool()?

🤦‍♂️ I think I was tired last night when I added these get functions.

lol fair enough 😁 wasn't sure if it was a personal preference

shahzadlone · 2022-09-05T12:00:44Z

db/collection_update.go

-	switch patch.(type) {
-	case []map[string]interface{}:
+	if parsedUpdater.Type() == fastjson.TypeArray {
 		isPatch = true
-	case map[string]interface{}:
-		isPatch = false
-	default:
+	} else if parsedUpdater.Type() != fastjson.TypeObject {


question: Did you ensure / test that this still does what we want it to do? I ask this because these lines were not covered by test coverage previously, and after this change, they still aren't.
https://app.codecov.io/gh/sourcenetwork/defradb/compare/772/diff

suggest: If not too painful to hit these, would be nice to have some tests asserting that your change works properly.

I would guess (if nothing else catches it first), that something like update_users(data: "1") {... could hit the new version, but I've not tried it.

Would be good to have, our 'negative' tests are quite lacking at the moment too

I've added unit test to cover those cases. Let me know if you like it.

Thanks for that fred! I do see they coverage is hit now. However I was wondering why not add integration tests rather than unit tests.

:) I forgot we added a location of Collection integration tests - I would say Fred's new tests are integration tests (they call only public funcs from what I see (minus setup which just follows the db_test.go format and is easily ported).

Adding a todo from me for this, as they should be really easy to move to tests/integration/collection/..., the utils there should slim them down a bit, and they will be much less liable to sudden deletion if their host file ever becomes a liability

and they will be much less liable to sudden deletion if their host file ever becomes a liability

Not sure I understand this. I'll ask during standup.

jsimnz

LGTM! Any notable thoughts/changes have already been brought up and adressed by others!

shahzadlone · 2022-09-09T02:06:48Z

db/db_test.go

+	_, err = col.UpdateWithKey(ctx, doc.Key(), `{{
+		"Name": "Eric"
+	}`)


nitpick: Perhaps something more obvious to spot as an invalid JSON.

A double curly brace makes it pretty obvious to me. What would make it more obvious to you?

The double brace requires reading the other 2 lines, and is very similar to the character next to it (e.g. how often have we all missed a bracket when coding without intelisense), the rest of the string is also valid json.

IMO it would be more obvious if the error was on the focus line (not the func call, but where "Name": "Eric" currently lies), and if the error was larger in size - e.g. the below:

_, err = col.UpdateWithKey(ctx, doc.Key(), `{ :--------------INVALID_JSON--------------- }`)

shahzadlone · 2022-09-09T02:13:54Z

db/collection_update.go

@@ -275,9 +269,9 @@ func (c *collection) updateWithFilter(
 		// Get the document, and apply the patch
 		doc := docMap.ToMap(query.Value())
 		if isPatch {
-			err = c.applyPatch(txn, doc, patch.([]map[string]interface{}))
+			// todo


suggestion: All todo's should have an associated ticket number. I know we had todo previously in the updateWithKey too suggest linking both with the same ticket if it will be relevant to both.

Yeah similar to comment RE dead code, I'd suggest deleting this if entirely (or going through the hassle of adding commit/branch to an issue if you really want it available). But if not, we did agree to only keep todos with a link to a ticket.

shahzadlone · 2022-09-09T02:16:57Z

db/collection_update.go

@@ -291,28 +285,38 @@ func (c *collection) updateWithFilter(
 	return results, nil
 }

-func (c *collection) applyPatch(
+func (c *collection) applyPatch( //nolint:unused


question: So all the //nolint:unused indicate that this is now all dead code? Why not completely remove them? OR are these being saved to work on in the future for patch support?

(non-blocking): I would suggest deletion here too - if it is dead and untested it can be assumed to be broken and will be of little value when actually implementing the feature (let alone to anyone reading this file in the meantime). If there is a ticket for this somewhere you can always dump the code in there if really wanted, or the commit hash that removes it, or if really fussed over it a branch name/commit that adds this function back in (a commit that adds it in is far easier to rebase that e.g. a current snapshot pre-removal)

I feel like this would increase the scope of this PR quite a bit. The reason this is happening is because there was a code path that was available but never actually used. Adding the unit test to hit the path resulted in an error because the feature is not actually fully supported. I suggest we leave it with //nolint for this PR and we can clean it up in a separate PR.

AndrewSisley

Approved, with a todo to address and a suggestion - thanks for moving the tests - they look good and seem to cover the code well :)

AndrewSisley · 2022-09-13T14:17:07Z

tests/integration/collection/update/with_filter_test.go

+	tests := []testUtils.TestCase{
+		{
+			Description: "Test update users with filter and invalid JSON",
+			Docs: map[string][]string{


todo: Probably much better to use the doc defined at the top of the test func instead of redefining it with the same values - the current setup is implicitly dependent on the dockeys being deterministic, and it hides the desired relationship and makes it easy to unwittingly edit the values within the test cases (and hide that from anyone reading)

you are absolutely right. I should have done that. Will fix.

AndrewSisley · 2022-09-13T14:20:38Z

tests/integration/collection/update/with_key_test.go

+					},
+				},
+			},
+			ExpectedError: "cannot parse JSON: cannot parse object: cannot find opening '\"\" for object key; unparsed tail: \"Name: \\\"Eric\\\"\\n\\t\\t\\t\\t\\t\\t}\"",


suggestion (non-blocking): Expected error should work with partials, and the current string contains stuff that we dont care about, and adds failure points that we'd probably rather not fail this test.

Suggest trimming it down a bit to the below (or similar):

cannot parse JSON: cannot parse object: cannot find opening '\"\" for object key; unparsed tail: \"Name: \\\"Eric\\\"

Are a couple more tests like this that could benefit IMO

Good point. cannot parse JSON: cannot parse object is even probably enough here.

AndrewSisley · 2022-09-13T14:22:27Z

tests/integration/collection/utils.go

@@ -22,6 +22,8 @@ import (
 )

 type TestCase struct {
+	Description string


question: You like the test cases? I didn't, and deliberately didn't bother adding them to this lib lol

You mean Description? I like to see this info when there are failed tests. Especially that we don't get a specific line number with these test structs.

Ah I mean an array of TestCases - I've started avoiding doing that, and use one TestCase per test function (reduces the usefulness of the description). Sorry my original comment was poorly worded.

I like that I can declare things once and reuse in the list of test cases.

:) I liked that about your tests here, I think I would naturally use a file level var/func (global in Go, which is a bit bad) and share it between tests

Description This PR makes use of the fastjson package to improve mutation data string parsing. Doing so also simplifies the validateFieldSchema function.

fredcarle added area/db-system Related to the core system related components of the DB refactor This issue specific to or requires *notable* refactoring of existing codebases and components action/no-benchmark Skips the action that runs the benchmark. labels Sep 2, 2022

fredcarle added this to the DefraDB v0.3.1 milestone Sep 2, 2022

fredcarle requested a review from a team September 2, 2022 06:11

fredcarle self-assigned this Sep 2, 2022

fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch from f2262a9 to 2b59a63 Compare September 2, 2022 06:15

fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch 2 times, most recently from 71111f7 to e7b0b75 Compare September 2, 2022 06:30

AndrewSisley requested changes Sep 2, 2022

View reviewed changes

shahzadlone reviewed Sep 5, 2022

View reviewed changes

jsimnz approved these changes Sep 7, 2022

View reviewed changes

fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch from 4f60e99 to e4840fa Compare September 8, 2022 18:57

shahzadlone reviewed Sep 9, 2022

View reviewed changes

fredcarle added 9 commits September 12, 2022 11:34

change to fastjson for field validation

2305106

merge convertNillableArray with getArray

fec052c

change to zeroValue as the determinator of nillable array

aa2f875

remove unnecessary declarations

5c2495b

apply feedback

fe6f0be

apply feedback

1378437

add unit test

bb15112

remove linting from unused patch related functions

a911ead

move test to integration

d6502df

fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch from 732e196 to d6502df Compare September 12, 2022 15:34

fredcarle requested review from shahzadlone and AndrewSisley September 12, 2022 23:59

AndrewSisley approved these changes Sep 13, 2022

View reviewed changes

implement feedback

f0275c5

fredcarle merged commit 55fca49 into develop Sep 13, 2022

fredcarle deleted the fredcarle/refactor/I124-validate-field-schema branch September 13, 2022 15:31

refactor: Use fastjson to parse mutation data string #772

refactor: Use fastjson to parse mutation data string #772

Conversation

fredcarle commented Sep 2, 2022

Relevant issue(s)

Description

Tasks

How has this been tested?

codecov bot commented Sep 2, 2022 • edited Loading

Codecov Report

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Sep 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsimnz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 2, 2022 •

edited

Loading

AndrewSisley Sep 6, 2022 •

edited

Loading

AndrewSisley Sep 9, 2022 •

edited

Loading