Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Use fastjson to parse mutation data string #772

Merged
merged 10 commits into from
Sep 13, 2022

Conversation

fredcarle
Copy link
Collaborator

Relevant issue(s)

Resolves #124

Description

This PR makes use of the fastjson package to improve mutation data string parsing. Doing so also simplifies the validateFieldSchema function.

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

Using integration tests

Specify the platform(s) on which this was tested:

  • MacOS

@fredcarle fredcarle added area/db-system Related to the core system related components of the DB refactor This issue specific to or requires *notable* refactoring of existing codebases and components action/no-benchmark Skips the action that runs the benchmark. labels Sep 2, 2022
@fredcarle fredcarle added this to the DefraDB v0.3.1 milestone Sep 2, 2022
@fredcarle fredcarle requested a review from a team September 2, 2022 06:11
@fredcarle fredcarle self-assigned this Sep 2, 2022
@fredcarle fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch from f2262a9 to 2b59a63 Compare September 2, 2022 06:15
@codecov
Copy link

codecov bot commented Sep 2, 2022

Codecov Report

Merging #772 (f0275c5) into develop (e96c1c9) will increase coverage by 0.41%.
The diff coverage is 62.37%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #772      +/-   ##
===========================================
+ Coverage    58.70%   59.11%   +0.41%     
===========================================
  Files          153      153              
  Lines        17083    16977     -106     
===========================================
+ Hits         10028    10036       +8     
+ Misses        6121     6023      -98     
+ Partials       934      918      -16     
Impacted Files Coverage Δ
db/collection_update.go 55.47% <62.37%> (+12.48%) ⬆️

@fredcarle fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch 2 times, most recently from 71111f7 to e7b0b75 Compare September 2, 2022 06:30
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, code is quite a bit nice than it was before :) Did you benchmark it at all? Got a few minor comments for you.

// UpdateWithKeys updates documents matching the given DocKeys.
//
// The provided updater must be a string Patch, string Merge Patch, a parsed Patch, or parsed Merge Patch
// else an ErrInvalidUpdater will be returned.
//
// Returns an ErrDocumentNotFound if a document is not found for any given DocKey.
UpdateWithKeys(context.Context, []DocKey, interface{}) (*UpdateResult, error)
UpdateWithKeys(context.Context, []DocKey, string) (*UpdateResult, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thanks for this type change - this always confused me and I thought there was more types that this could handle but the diff suggests it was just the two. Do check in with John though if you haven't already to make sure we are not losing anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

As far as I could tell the value passed was always a string but yes I will be checking with John.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to be able to support native Go types map[string]interface{} If using the programatic approach, but at the moment, its only used/tested from the POV of the query system, which only handles strings.

I would nice to still support this, but its prob lower priority, and supporting native types gets in the way of using the fastjson approach outlined in this PR anyway. So its fine for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at the last commit for a potential solution to keep Go maps as a possible updater

) error {
keyStr, ok := doc["_key"].(string)
if !ok {
return errors.New("Document is missing key")
return errors.New("document is missing key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Making these changes within commits that contain more significant changes refactors/feat changes does make the review harder, adding noise that makes it harder to focus on stuff that needs attention. Would suggest doing cosmetic stuff like this in separate commits in the future

if _, ok := mval.(map[string]interface{}); ok {

mergeMap := make(map[string]*fastjson.Value)
merge.Visit(func(k []byte, v *fastjson.Value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Why bother with the extra variable/iteration and move the contents of the for loop into this Visit function? Feels a bit odd to have both

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I can't return an error from the anonymous function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could assign the error to a variable defined outside of the anon-func though, and return that? Then you can stick to a single iteration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is that if we have say 20 fields and we have an error on the first one, there is no way of stopping the Visit method from iterating over all the other fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah true, it wont break :) Thanks

}

val := client.NewCBORValue(fd.Typ, cval)
val := client.NewCBORValue(fd.Typ, mergeCBOR[mfield])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This feels odd to me and I think the code would be clearer and slightly safer and efficient if you just use the object to created with validateFieldSchema instead of fetching it from the map. Fetching it suggests that it might be something else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's assigned directly to the map hence why it's then fetching from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but instead of doing:

foo := bar()
map[x] = foo
y := map[x]
foobar(y)

you could just do

foo := bar()
map[x] = foo
foobar(foo)

This way there is no ambiguity over what is being passed into fooBar, and the developer reading and the CPU has to do slightly less work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that

map[x] = bar()
foobar(map[x])

is actually less CPU intensive. I also find very little ambiguity here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is secondary here, but you are comparing the cost of fetching an item from a map to the cost of doing nothing.

Similar thought process regarding readability. Only when reading the developer has to also wonder why to would insert something in to a map, and then immediately try and find it again - suggesting that it might have changed between the two lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seam that compiler efficiency would lead to the

foo := bar()
map[x] = foo
foobar(foo)

being the better option. https://godbolt.org/z/cqEYoWWxc

I disagree that it improves readability as there is only an error check of 3 lines between the assignment and where it is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 lines (now) and a double take to make sure it is not shared between threads etc. or mutated within any funcs that may be called within those 3 lines.

Is just an odd thing to see IMO. Leave it in if you want though.

func convertNillableArrayWithConverter[TIn any, TOut any](val any, converter func(TIn) TOut) ([]*TOut, error) {
if val == nil {
return nil, nil
if zeroValue == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This is bit odd, it looks like you have two functions in here not one :) Suggest using two functions instead of an extra param that is unused for half the callers and a big if block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how you see that it's unused. It's used for all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If zeroValue is nil, it is not used. The param is used to determine if function A or function B is used in that instance, and it is a compile-time constant here - used to toggle a code branch at runtime.

zeroValue == nil is known at compile time, but you are making it appear (to both developer and CPU) that it is a runtime thing, making the code-flow more complicated than it needs to be.

func getArray[T any](
val *fastjson.Value,
typeGetter func(*fastjson.Value) (T, error),
zeroValue any,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Why is zeroValue of type any, and not of type T? I can't spot the reason for that, and if you use type T you remove the potential runtime failure that you are having to check for in the body of the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because zeroValue cannot be nil it type is T

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean for the purpose of the big-if I referred to in another comment? That feels like an abuse of the type system if the zero value of T is not of type T

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is the zero value of T cannot be nil. So we can either do:

func getArray[T any](
	val *fastjson.Value,
	typeGetter func(*fastjson.Value) (T, error),
	zeroValue T,
) (any, error) {
	if val.Type() == fastjson.TypeNull {
		return nil, nil
	}

	valArray, err := val.Array()
	if err != nil {
		return nil, err
	}

	arr := make([]T, len(valArray))
	for i, arrItem := range valArray {
		if arrItem.Type() == fastjson.TypeNull {
			arr[i] = zeroValue
			continue
		}
		arr[i], err = typeGetter(arrItem)
		if err != nil {
			return nil, err
		}
	}
	return arr, nil
}

func getNillableArray[T any](
	val *fastjson.Value,
	typeGetter func(*fastjson.Value) (T, error),
) ([]*T, error) {
	if val.Type() == fastjson.TypeNull {
		return nil, nil
	}

	valArray, err := val.Array()
	if err != nil {
		return nil, err
	}

	arr := make([]*T, len(valArray))
	for i, arrItem := range valArray {
		if arrItem.Type() == fastjson.TypeNull {
			arr[i] = nil
			continue
		}
		v, err := typeGetter(arrItem)
		if err != nil {
			return nil, err
		}
		arr[i] = &v
	}
	return arr, nil
}

or keep it the way I currently have it:

func getArray[T any](
	val *fastjson.Value,
	typeGetter func(*fastjson.Value) (T, error),
	zeroValue any,
) (any, error) {
	if val.Type() == fastjson.TypeNull {
		return nil, nil
	}

	valArray, err := val.Array()
	if err != nil {
		return nil, err
	}

	if zeroValue == nil {
		arr := make([]*T, len(valArray))
		for i, arrItem := range valArray {
			if arrItem.Type() == fastjson.TypeNull {
				arr[i] = nil
				continue
			}
			v, err := typeGetter(arrItem)
			if err != nil {
				return nil, err
			}
			arr[i] = &v
		}
		return arr, nil
	}

	arr := make([]T, len(valArray))
	for i, arrItem := range valArray {
		if arrItem.Type() == fastjson.TypeNull {
			var ok bool
			arr[i], ok = zeroValue.(T)
			if !ok {
				return nil, errors.New("zeroValue should be of the same type as the array items type")
			}
			continue
		}
		arr[i], err = typeGetter(arrItem)
		if err != nil {
			return nil, err
		}
	}
	return arr, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to set arr[i] = zeroValue in getArray, you just continue - no zeroValue required (leave as array default)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Unless we want the option to set a different default value.

So you prefer the two function approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, I see the current as two functions pretending to be one - the main function body is split into two, and only one of those two will execute depending on an input parameter that is (currently) always a compile-time constant.


func getBool(v *fastjson.Value) (bool, error) {
b, err := v.Bool()
return b, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I'm not nitpicking here, just curious - why are you doing this and not just return v.Bool()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ I think I was tired last night when I added these get functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol fair enough 😁 wasn't sure if it was a personal preference

Comment on lines -183 to +182
switch patch.(type) {
case []map[string]interface{}:
if parsedUpdater.Type() == fastjson.TypeArray {
isPatch = true
case map[string]interface{}:
isPatch = false
default:
} else if parsedUpdater.Type() != fastjson.TypeObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Did you ensure / test that this still does what we want it to do? I ask this because these lines were not covered by test coverage previously, and after this change, they still aren't.
https://app.codecov.io/gh/sourcenetwork/defradb/compare/772/diff

suggest: If not too painful to hit these, would be nice to have some tests asserting that your change works properly.

Copy link
Contributor

@AndrewSisley AndrewSisley Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess (if nothing else catches it first), that something like update_users(data: "1") {... could hit the new version, but I've not tried it.

Would be good to have, our 'negative' tests are quite lacking at the moment too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added unit test to cover those cases. Let me know if you like it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that fred! I do see they coverage is hit now. However I was wondering why not add integration tests rather than unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I forgot we added a location of Collection integration tests - I would say Fred's new tests are integration tests (they call only public funcs from what I see (minus setup which just follows the db_test.go format and is easily ported).

Adding a todo from me for this, as they should be really easy to move to tests/integration/collection/..., the utils there should slim them down a bit, and they will be much less liable to sudden deletion if their host file ever becomes a liability

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and they will be much less liable to sudden deletion if their host file ever becomes a liability

Not sure I understand this. I'll ask during standup.

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Any notable thoughts/changes have already been brought up and adressed by others!

@fredcarle fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch from 4f60e99 to e4840fa Compare September 8, 2022 18:57
db/db_test.go Outdated
Comment on lines 596 to 598
_, err = col.UpdateWithKey(ctx, doc.Key(), `{{
"Name": "Eric"
}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Perhaps something more obvious to spot as an invalid JSON.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A double curly brace makes it pretty obvious to me. What would make it more obvious to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double brace requires reading the other 2 lines, and is very similar to the character next to it (e.g. how often have we all missed a bracket when coding without intelisense), the rest of the string is also valid json.

IMO it would be more obvious if the error was on the focus line (not the func call, but where "Name": "Eric" currently lies), and if the error was larger in size - e.g. the below:

_, err = col.UpdateWithKey(ctx, doc.Key(), `{
   :--------------INVALID_JSON---------------
}`)

@@ -275,9 +269,9 @@ func (c *collection) updateWithFilter(
// Get the document, and apply the patch
doc := docMap.ToMap(query.Value())
if isPatch {
err = c.applyPatch(txn, doc, patch.([]map[string]interface{}))
// todo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: All todo's should have an associated ticket number. I know we had todo previously in the updateWithKey too suggest linking both with the same ticket if it will be relevant to both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah similar to comment RE dead code, I'd suggest deleting this if entirely (or going through the hassle of adding commit/branch to an issue if you really want it available). But if not, we did agree to only keep todos with a link to a ticket.

@@ -291,28 +285,38 @@ func (c *collection) updateWithFilter(
return results, nil
}

func (c *collection) applyPatch(
func (c *collection) applyPatch( //nolint:unused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: So all the //nolint:unused indicate that this is now all dead code? Why not completely remove them? OR are these being saved to work on in the future for patch support?

Copy link
Contributor

@AndrewSisley AndrewSisley Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking): I would suggest deletion here too - if it is dead and untested it can be assumed to be broken and will be of little value when actually implementing the feature (let alone to anyone reading this file in the meantime). If there is a ticket for this somewhere you can always dump the code in there if really wanted, or the commit hash that removes it, or if really fussed over it a branch name/commit that adds this function back in (a commit that adds it in is far easier to rebase that e.g. a current snapshot pre-removal)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this would increase the scope of this PR quite a bit. The reason this is happening is because there was a code path that was available but never actually used. Adding the unit test to hit the path resulted in an error because the feature is not actually fully supported. I suggest we leave it with //nolint for this PR and we can clean it up in a separate PR.

@fredcarle fredcarle force-pushed the fredcarle/refactor/I124-validate-field-schema branch from 732e196 to d6502df Compare September 12, 2022 15:34
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, with a todo to address and a suggestion - thanks for moving the tests - they look good and seem to cover the code well :)

tests := []testUtils.TestCase{
{
Description: "Test update users with filter and invalid JSON",
Docs: map[string][]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Probably much better to use the doc defined at the top of the test func instead of redefining it with the same values - the current setup is implicitly dependent on the dockeys being deterministic, and it hides the desired relationship and makes it easy to unwittingly edit the values within the test cases (and hide that from anyone reading)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are absolutely right. I should have done that. Will fix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
},
},
ExpectedError: "cannot parse JSON: cannot parse object: cannot find opening '\"\" for object key; unparsed tail: \"Name: \\\"Eric\\\"\\n\\t\\t\\t\\t\\t\\t}\"",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): Expected error should work with partials, and the current string contains stuff that we dont care about, and adds failure points that we'd probably rather not fail this test.

Suggest trimming it down a bit to the below (or similar):

cannot parse JSON: cannot parse object: cannot find opening '\"\" for object key; unparsed tail: \"Name: \\\"Eric\\\"

Are a couple more tests like this that could benefit IMO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. cannot parse JSON: cannot parse object is even probably enough here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -22,6 +22,8 @@ import (
)

type TestCase struct {
Description string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: You like the test cases? I didn't, and deliberately didn't bother adding them to this lib lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Description? I like to see this info when there are failed tests. Especially that we don't get a specific line number with these test structs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I mean an array of TestCases - I've started avoiding doing that, and use one TestCase per test function (reduces the usefulness of the description). Sorry my original comment was poorly worded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that I can declare things once and reuse in the list of test cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I liked that about your tests here, I think I would naturally use a file level var/func (global in Go, which is a bit bad) and share it between tests

@fredcarle fredcarle merged commit 55fca49 into develop Sep 13, 2022
@fredcarle fredcarle deleted the fredcarle/refactor/I124-validate-field-schema branch September 13, 2022 15:31
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
Description
This PR makes use of the fastjson package to improve mutation data string parsing. Doing so also simplifies the validateFieldSchema function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/db-system Related to the core system related components of the DB refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework collection_update.validateFieldSchema
4 participants