Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for inline arrays of nillable types #759

Merged
merged 9 commits into from
Aug 30, 2022

Conversation

AndrewSisley
Copy link
Contributor

Relevant issue(s)

Resolves #425

Description

Adds support for inline arrays of nillable values. Also adds a very minimal Option struct to hep facilitate this - this struct can be fleshed out further as and when required.

Specify the platform(s) on which this was tested:

  • Debian Linux

@AndrewSisley AndrewSisley added feature New feature or request area/query Related to the query component area/schema Related to the schema system action/no-benchmark Skips the action that runs the benchmark. labels Aug 24, 2022
@AndrewSisley AndrewSisley added this to the DefraDB v0.3.1 milestone Aug 24, 2022
@AndrewSisley AndrewSisley requested a review from a team August 24, 2022 21:15
@AndrewSisley AndrewSisley self-assigned this Aug 24, 2022
@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #759 (eed8a32) into develop (3727613) will increase coverage by 0.09%.
The diff coverage is 72.90%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #759      +/-   ##
===========================================
+ Coverage    58.25%   58.35%   +0.09%     
===========================================
  Files          146      147       +1     
  Lines        16652    16829     +177     
===========================================
+ Hits          9701     9820     +119     
- Misses        6043     6084      +41     
- Partials       908      925      +17     
Impacted Files Coverage Δ
client/descriptions.go 83.33% <ø> (ø)
db/fetcher/encoded_doc.go 54.03% <53.03%> (-1.11%) ⬇️
db/collection_update.go 42.99% <60.65%> (+2.27%) ⬆️
client/option.go 100.00% <100.00%> (ø)
connor/eq.go 87.50% <100.00%> (+6.94%) ⬆️
query/graphql/mapper/mapper.go 86.74% <100.00%> (-1.12%) ⬇️
query/graphql/planner/count.go 92.13% <100.00%> (+0.77%) ⬆️
query/graphql/planner/sum.go 82.41% <100.00%> (+1.46%) ⬆️
query/graphql/schema/descriptions.go 87.57% <100.00%> (+0.73%) ⬆️
query/graphql/schema/generate.go 83.96% <100.00%> (+0.03%) ⬆️
... and 2 more

package client

// Option represents an item that may or may not have a value.
type Option[T any] struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Rename to Optional? Option makes me think 'like a config option'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option is a noun and Optional an adjective, Option also matches the rust name (and the name of type in the lib John linked)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to see rust uses Option. I might have been bias because I have seen C++ std library and typescript using Optional.

https://github.com/bromne/typescript-optional
https://en.cppreference.com/w/cpp/utility/optional

Comment on lines +14 to +34
type Option[T any] struct {
// If HasValue is true, this Option contains a value, if
// it is false it contains no value.
HasValue bool

// The Value of this Option. Should be ignored if HasValue is false.
Value T
}

// Some returns an `Option` of type `T` with the given value.
func Some[T any](value T) Option[T] {
return Option[T]{
HasValue: true,
Value: value,
}
}

// Some returns an `Option` of type `T` with no value.
func None[T any]() Option[T] {
return Option[T]{}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: been looking forward for these magical types!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers, is really nice having access generics again :)

@@ -548,6 +579,51 @@ func validateFieldSchema(val interface{}, field client.FieldDescription) (interf
return cval, err
}

func covertNillableArrayToCval[T any](val any) ([]*T, error) {
Copy link
Member

@shahzadlone shahzadlone Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What does Cval stand for? Also covert as in hide / cover? or did you mean convert?

Copy link
Contributor Author

@AndrewSisley AndrewSisley Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember what Cval stands for sorry. covert is a typo :) thanks for spotting, I'll correct it (and check the other places I may have copy-pasted it)

  • convert typo

if val == nil {
return nil, nil
}
untypedCollection := val.([]interface{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why not check the cast performed properly? I remember trying this cast from (any) -> .[]any a few times and the ok value (i.e. _, ok :=...) kept being false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it reaches this line and is not of that type we probably have far greater problems (like a corrupt database or something). an ok is a waste of eye space, dead code, and misleading in that it suggests such a thing is possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still avoid a panic and we can report a proper error no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not particularly fussed about panics that can only happen if the database is already corrupted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an instance where the underlying type was []float but it in a switch case with multiple options ([]float, []string etc. all in one case) made it be any type which when casted to []any didn't make it behave like I wanted to. However assuming that was an odd case and you know what your are doing here such that it won't panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[]float doesnt cast to []any - arrays/slices are of a particular type and it would be unsafe to allow casts between them, particularly pointers (any == pointer). Think about what is going on size/memory wise if you do so - an array is (mostly) a continuous set in memory, there is usually a pointer to the first element and a count, if the size of each element could change (e.g. when casting from []bool to []any) then you'd end up with real problems :)

Here the underlying type should always be []any, any will be a pointer to something concrete, but it will always be an array of pointers.

Copy link
Member

@jsimnz jsimnz Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick nitpick: "any == pointer" isn't exactly correct :) - Its implemented as a struct that contains a pointer (technically 2 pointers, one of which is an unsafe pointer)

Copy link
Contributor Author

@AndrewSisley AndrewSisley Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, seriously? 😅 What is the second pointer to, the actual type?

Comment on lines 582 to 602
func covertNillableArrayToCval[T any](val any) ([]*T, error) {
if val == nil {
return nil, nil
}
untypedCollection := val.([]interface{})
// Cbor deals with pointers better than structs by default, however in the future
// we may want to write a custom encoder for the Option[T] type
resultArray := make([]*T, len(untypedCollection))
for i, value := range untypedCollection {
if value == nil {
resultArray[i] = nil
continue
}
tValue, ok := value.(T)
if !ok {
return nil, fmt.Errorf("Failed to cast value: %v of type: %T to %T", value, value, *new(T))
}
resultArray[i] = &tValue
}
return resultArray, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Also curious why this function needed to use a generics, could this just not be the following, and after that it can be be used like this cval, err = covertNillableArrayToCval(val) :

Suggested change
func covertNillableArrayToCval[T any](val any) ([]*T, error) {
if val == nil {
return nil, nil
}
untypedCollection := val.([]interface{})
// Cbor deals with pointers better than structs by default, however in the future
// we may want to write a custom encoder for the Option[T] type
resultArray := make([]*T, len(untypedCollection))
for i, value := range untypedCollection {
if value == nil {
resultArray[i] = nil
continue
}
tValue, ok := value.(T)
if !ok {
return nil, fmt.Errorf("Failed to cast value: %v of type: %T to %T", value, value, *new(T))
}
resultArray[i] = &tValue
}
return resultArray, nil
}
func covertNillableArrayToCval(val any) ([]*any, error) {
if val == nil {
return nil, nil
}
untypedCollection := val.([]any)
// Cbor deals with pointers better than structs by default, however in the future
// we may want to write a custom encoder for the Option[T] type
resultArray := make([]*any, len(untypedCollection))
for i, value := range untypedCollection {
if value == nil {
resultArray[i] = nil
continue
}
tValue := value
resultArray[i] = &tValue
}
return resultArray, nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly dislike losing known type information when there is little reason to do so. The generic version should also preserve it's efficiency if/when introducing the custom encoder, whereas the any version would force pointers over the raw structs

Comment on lines 463 to 470

case client.FieldKind_NILLABLE_STRING_ARRAY:
cval, err = covertNillableArrayToCval[string](val)
if err != nil {
return nil, err
}
ok = true

Copy link
Member

@shahzadlone shahzadlone Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: If you do update this function to just take any parameter and not be a generic, all these cases can just squeeze into the following two cases:

	case client.FieldKind_NILLABLE_STRING_ARRAY, client.FieldKind_NILLABLE_BOOL_ARRAY, client.FieldKind_NILLABLE_FLOAT_ARRAY:
		cval, err = covertNillableArrayToCval(val)
		if err != nil {
			return nil, err
		}
		ok = true

	case client.FieldKind_NILLABLE_INT_ARRAY:
		cval, err = covertNillableArrayToCvalWithConverter(val, func(in float64) int64 { return int64(in) })
		if err != nil {
			return nil, err
		}
		ok = true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might justify the forcing of pointers and the extra hassle if/when using a custom encoder (and having to re-write to the current), however I do not see it as important, and impossible to know which is better. Will leave as is unless anyone else chimes in here

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor stuff.

Overall I'm concerned about the structure of our dynamic types, but that isn't at all the fault of this PR but a symptom that is expanded a bit in this PR.

Started playing around with a better designed internal type system that may eventually make with like this more succinct.

Will prob try to formally address in 0.4.

But, specific to this PR 👍

Comment on lines +15 to +20
// If HasValue is true, this Option contains a value, if
// it is false it contains no value.
HasValue bool

// The Value of this Option. Should be ignored if HasValue is false.
Value T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I feel like these should be private with getters / helpers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Why force extra cost on the consumers of this?

Is little/no risk in allowing direct access (I want unions to do this properly, as atm you can access Value if HasValue is false, but in the future that will hopefully be impossible)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the extra cost, and it ensures correct usage of the API, as right now it's very easy to accidentally break the Option semantics by direct manipulation. There's no need for them to be public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling a function has a (small) cost. It also obfuscates the underlying data structure. It would also be dead internally at the moment.

People have different preferences on this topic (OOP vs DDD/func-prog), and I see this discussion as one of mine vs yours personal preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not dead if you make them private, but it adds no value to the current code

Copy link
Contributor Author

@AndrewSisley AndrewSisley Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny-go more likely to be used in resource constrained environments

More likely, but not guaranteed, and resource constraints can may this kind of thing even more important.

If its inlined, then there is no "passing" the value between stacks into/outof a function register.

I thought assignment always copied the value (occasional compiler magic aside)? x = option.value always copies the value from option.value into whereever x is? Hiding the source behind a function forces that copying instead of being able to reference option.value directly?

In any case, we are way to far down the optimzation rabbit hole

100% agree, but I think you are misunderstanding me on the below?

The core discussion is the safety/immutability/readonly nature.

Not for me, this is about PR author preference vs reviewer preference. The engineering side has no right answer at the moment as we have no concrete use-cases for defra affected by this and so performance itself does not matter here. What does matter is that based on my personal past experience (typically writing software for the efficient aggregation and display of large datasets in mutation-safer langs) my personal preference (gut feeling saying this is safer) is to use the raw struct, whereas yours is to use a getter.

I do not care so much as to which one is chosen in the short term, but I do care an awful lot RE dev freedom and the priority of the author's personal preference - you earlier said you were happy enough either way and didn't want to play the CTO card. I also said I might even resort to flipping a coin :) I think the convo is now just continuing for fun/learning lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically writing software for the efficient aggregation and display of large datasets in mutation-safer langs

Just in case taken the wrong way, I mentioned that as the go-to/default tool of guy who has spent the last 5 years pushing nails into wood is going to be the hammer, even if the problem he is looking at is not necessarily nails-into-wood - we all have our biases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for me, this is about PR author preference vs reviewer preference.

I mentioned further up that I don't believe this is a simple convo of author vs reviewer:
"I still don't think this particular example, knowing Idiomatic Go, and the gurantees/overhead of the language/type system, is a matter of personal preference, and would emphatically suggest implementing immutable gurantees on this type as described."

Obfuscation is a subjective definition, immutability is an objective one.

do not care so much as to which one is chosen in the short term, but I do care an awful lot RE dev freedom and the priority of the author's personal preference

I'm not sure I hold the same opinion here w.r.t freedom. We are all beholdent to the code, and we each individually have our biases and such. The whole point of the reviewer system to (imo) is to hopefully remove said biases by having multiple (one or more) ppl involved in the verification of said code. With all things being equal, a simple (subjective vs subjective) disagreement is fairly easy to resolve by defaulting to author, but I don't see that is a unbreakable rule. I am stubborn, I get tired, I write bad code (at times), I misunderstand goals/constraints, I accidentally leave fmt.Print debug statements in PRs, I have coding styles made up over experience, etc, simply being the author doesn't necessarily imbue stronger powers.

But, as I said before, my approval is already given, merge away 👍, most of my comments thus far have been to articulate my perceived dangers, not preferences. (We could say everything is a matter of preferences, but there has to be a line drawn in the sand, otherwise, noone would ever get anything done ;).)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obfuscation is a subjective definition, immutability is an objective one.

And which one has greater costs is also highly subjective, and the opinions will vary depending upon which projects each developer has worked on in the past.

The code here is clear, correct (as far as we can see at least), easy to change, and doesn't run like a dog. It is not doing anything weird, is well enough organised, and has been tested. No new linter rules have been requested. I'd lump most things outside of those categories as personal opinion/preference and totally unimportant to the PR/codebase.

If we find ourselves regularly tripping up over mutations to this struct it is easy to change. Likewise performance issues directly resulting from getters are easy enough to solve once found. I would argue that poorly written code that came about due to overly abstracted data-structures is much harder to change, but that argument is only half-fair and loops us back into the subjectivity/personal-opinion topic.

The immutability guarantees provided by getters do not affect the code under review (nowhere in this review cares), and the against them are equally (ir)relevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well said!

I might even resort to flipping a coin :)

obligatory...
https://xkcd.com/221/

return resultArray, nil
}

func convertNillableArrayToCvalWithConverter[TIn any, TOut any](val any, converter func(TIn) TOut) ([]*TOut, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cval name isn't necessary, I assume you picked it up from the original variable name.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was the name of the serialization format value :) Is that incorrect? Will have a look at renaming these in the meantime - might not have to so specific anyway

  • rename?

Copy link
Contributor Author

@AndrewSisley AndrewSisley Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped ToCval from both function names

@@ -548,6 +579,51 @@ func validateFieldSchema(val interface{}, field client.FieldDescription) (interf
return cval, err
}

func convertNillableArrayToCval[T any](val any) ([]*T, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cval name isn't necessary. Same as other comment

@jsimnz
Copy link
Member

jsimnz commented Aug 29, 2022

Some minor stuff.

Overall I'm concerned about the structure of our dynamic types, but that isn't at all the fault of this PR but a symptom that is expanded a bit in this PR.

Started playing around with a better designed internal type system that may eventually make with like this more succinct.

Will prob try to formally address in 0.4.

But, specific to this PR 👍

Also, huge praise on the scale of tests 🥰

@AndrewSisley
Copy link
Contributor Author

Started playing around with a better designed internal type system that may eventually make with like this more succinct.

Would be interested in seeing that, our current setup is very wasteful in that everything on a doc is a pointer. And the obvious C-like correction of that would probably be quite unsafe and could freak out Go devs 😆

Also, huge praise on the scale of tests

Cheers, the prod code is simple enough, but the aggregates just means there are so many ways in which stuff like this could fail

@AndrewSisley AndrewSisley force-pushed the sisley/feat/I425-nillable-inline-arrays branch from 956afde to ea2c4cc Compare August 29, 2022 13:31
@AndrewSisley AndrewSisley force-pushed the sisley/feat/I425-nillable-inline-arrays branch from ea2c4cc to eed8a32 Compare August 29, 2022 13:31
@AndrewSisley AndrewSisley merged commit 7d072bf into develop Aug 30, 2022
@AndrewSisley AndrewSisley deleted the sisley/feat/I425-nillable-inline-arrays branch August 30, 2022 14:07
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
)

* Correctly handle inline array nil-filters

Was no way to test this before for inline arrays - appart from via explain (which was written with an incorrect assertion), and it was broken :)

* Add option type

Allows for safe(r) and efficient handling of nillable values.

* Add nillable array decode func

* Add func to convert to cval type

* Add nillable bool array support

* Add nillable int array support

* Add nillable float array support

* Add nillable string array support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/query Related to the query component area/schema Related to the schema system feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No support for nil items within an inline array
5 participants