-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add decode option to allow rejecting inputs that contain certain simple values. #481
Conversation
@benluddy Thanks for this POC and discussion in issue #477. I focused on big picture since this has draft status. If my understanding is correct, use cases are:
The first two use cases are handled nicely in this POC since decoder uses either However, the third use case seems error-prone because user needs to figure out how to trigger default behavior and apply it for all simple values which don't require custom handling. Maybe it can be more user friendly. Thoughts? |
To be clear, this would support users who want to do something like reject the undefined simple value instead of treating it as nil, but otherwise preserve the existing treatment of all other simple values? I will take a look at that. Maybe this sort of usage would be clearer if the option field had an opaque type, with some exported functionality to allow users to construct option values that effectively specify default "overrides" for only a few simple values. |
2208a23
to
2f65e3b
Compare
@fxamacker based on your feedback, I changed the option from a sparse map to a pointer to an opaque, immutable
SimpleValues: nil, or SimpleValues: NewSimpleValueRegistry(WithDefaultSimpleValueAnalogs),
SimpleValues: NewSimpleValueRegistry(
// any simple value not registered will be rejected
// this example will only recognize false and true
WithSimpleValueAnalog(20, false),
WithSimpleValueAnalog(21, true),
),
SimpleValues: NewSimpleValueRegistry(
WithDefaultSimpleValueAnalogs,
WithNoSimpleValueAnalog(23), // reject undefined
WithSimpleValueAnalog(200, some.Thing),
), |
@benluddy Thanks for updating this PR! I'll take a closer look this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benluddy Thanks for updating this PR! Looks great!
I left a few comments, e.g. need to forbid mutable SimpleValue
analogs, provide an additional way to create SimpleValue
registry (to reduce user mistakes), etc.
nit: Not sure, but would names be more obvious if "Analog" was replaced by "Substitute" or "Sub"? E.g. "SimpeValueAnalog" -> "SimpleValueSubstitute"
decode.go
Outdated
// SimpleValueRegistry is an immutable mapping from CBOR simple value number (0...23 and 32...255) | ||
// to Go analog value. | ||
type SimpleValueRegistry struct { | ||
analogs [256]*interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
decode.go
Outdated
if sv >= 24 && sv <= 31 { | ||
return fmt.Errorf("cbor: cannot set analog for reserved simple value %d", sv) | ||
} | ||
r.analogs[sv] = &analog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably deny mutable analogs. If analog
is of type *int
, the integer value can be modified after SimpleValueRegistry
is created.
In addition to forbidding pointers, we could also forbid containers able to hold pointers. This allows us to avoid some code bloat (e.g. checking if any element/field or nested element/field is a pointer).
For simplicity, maybe we can require analogs to essentially be one of these:
nil
bool
string
- integers
- floats
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident that the library should forbid this. As you implied, it's not trivial to pin down without making the limitations very strict.
If limited to only a few built-in types, it becomes hard for users to distinguish between a simple value and a different value with an equivalent Go representation. If the limitation is relaxed to be based on kind rather than type, then users could define a named type with one of the supported underlying types, and use an analog of that type to distinguish the simple value.
However, I think that would eventually bite users who register analogs that they don't entirely control. If a perfect analog value is provided by a library, users may be forced to perform a second pass over their decoded object to convert from a user-defined sentinel value to the analog that they want to appear in the final object. Alternatively, a library-provided value may change during a dependency update in such a way that it no longer meets our restrictions on acceptable analog values.
I don't have a use case for this presently, but I lean toward leaving users responsible for recognizing that their *int
analog is the pointer itself and not the referenced value. Can you imagine this causing thorny issues down the line within this library? Map key equivalence is the main area that comes to my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If limited to only a few built-in types, it becomes hard for users to distinguish between a simple value and a different value with an equivalent Go representation. If the limitation is relaxed to be based on kind rather than type, then users could define a named type with one of the supported underlying types, and use an analog of that type to distinguish the simple value.
I agree. We can restrict based on kind rather than type.
However, I think that would eventually bite users who register analogs that they don't entirely control. If a perfect analog value is provided by a library, users may be forced to perform a second pass over their decoded object to convert from a user-defined sentinel value to the analog that they want to appear in the final object. Alternatively, a library-provided value may change during a dependency update in such a way that it no longer meets our restrictions on acceptable analog values.
Maybe we can also allow users to register Unmarshaler
to handle simple values, as supported for tags.
So user can:
- perform decoding operation via
UnmarshalCBOR
for registered simple value if any, or - decode simple value to registered analog
This would give user the flexibility to handle more complex analog (pointers and containers) and also let user assume the responsibility for handling edge cases related to that.
Thoughts?
I don't have a use case for this presently, but I lean toward leaving users responsible for recognizing that their
*int
analog is the pointer itself and not the referenced value. Can you imagine this causing thorny issues down the line within this library? Map key equivalence is the main area that comes to my mind.
My main concern is potential concurrency bugs introduced by users if we allow pointers and containers that can hold pointers. Time spent troubleshooting bugs in 3rd party apps or libraries takes time away from improving this codec.
Maybe we can forbid pointers and containers in this PR and open a separate PR to add support for registering Unmarshaler
for simple values, which would allow the user to implement support pointers, containers, etc.
decode.go
Outdated
return &UnmarshalTypeError{ | ||
CBORType: t.String(), | ||
GoType: v.Type().String(), | ||
errorMsg: "simple value " + strconv.FormatInt(int64(val), 10) + " is not recognized", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe return a new type of error (e.g. UnregisteredSimpleValueError
) instead of reusing UnmarshalTypeError
because:
UnmarshalTypeError
is returned when CBOR data can't be decoded to specified Go type- The simple value error only happens when given simple value is explicitly not registered by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. WDYT about a more general UnsupportedValueError
that is returned when we recognize a valid input but are configured not to accept it? This sort of thing is mentioned in https://www.rfc-editor.org/rfc/rfc8949.html#section-5-3:
CBOR-based protocols MUST specify how their decoders handle invalid and other unexpected data. CBOR-based protocols MAY specify that they treat arbitrary valid data as unexpected. Encoders for CBOR-based protocols MUST produce only valid items, that is, the protocol cannot be designed to make use of invalid items. An encoder can be capable of encoding as many or as few types of values as is required by the protocol in which it is used; a decoder can be capable of understanding as many or as few types of values as is required by the protocols in which it is used.
I have a similar use case for rejecting inputs containing bignum tags, and I don't think applications would get much utility from having distinct error types for different inputs that the library supports but their application protocol does not.
2f65e3b
to
58cd838
Compare
Thanks for the thorough review @fxamacker, I think I've made changes to address all except one of the inline comments. I held off on changing #481 (comment) for the moment. If you feel strongly about it then I'll go ahead and implement the change you suggested (that decision is not critical to any use case I have).
For me "analog" is a closer fit because there is a 1:1 correspondence between a simple value and a unique Go value. "Substitute" might imply that there is a natural representation of any given simple value that the substitute is replacing, but only a few of the simple values (true, false, and maybe null) have obvious Go representations. I'll defer to your judgment on this as well, please let me know if you would prefer to see the terminology changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @benluddy for updating this PR! It looks great!
For me "analog" is a closer fit because there is a 1:1 correspondence between a simple value and a unique Go value. "Substitute" might imply that there is a natural representation of any given simple value that the substitute is replacing, but only a few of the simple values (true, false, and maybe null) have obvious Go representations. I'll defer to your judgment on this as well, please let me know if you would prefer to see the terminology changed.
Thanks for describing the reasons for using the term "analog", it makes sense, lets keep your preferred terminology! 👍
I held off on changing #481 (comment) for the moment. If you feel strongly about it then I'll go ahead and implement the change you suggested (that decision is not critical to any use case I have).
I prefer being more restrictive in built-in decoding functions (forbidding containers and pointers as analogs to SimpleValue
) and allowing users to implement support for containers, pointers, etc. by registering Unmarshaler
for simple values if they need it.
By forbidding containers and pointers as analogs for simple values:
- We avoid making it easier for users to introduce concurrency bugs. This helps me limit time spent troubleshooting concurrency bugs in 3rd-party apps and libraries when users suggest the bug is in this codec.
- We avoid premature code bloat from checking nested elements of containers, etc.
- We can open separate PR to allow users to register
Unmarshaler
for simple values so users can implement support for all kinds of types including containers and pointers.
Thoughts?
decode.go
Outdated
// Creates a new SimpleValueRegistry. The registry state is initialized by executing the provided | ||
// functions in order against an empty registry. Any simple value without a registered analog will | ||
// produce an UnacceptableDataItemError if encountered in the input while unmarshaling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add function name to comment.
// Creates a new SimpleValueRegistry. The registry state is initialized by executing the provided | |
// functions in order against an empty registry. Any simple value without a registered analog will | |
// produce an UnacceptableDataItemError if encountered in the input while unmarshaling. | |
// NewSimpleValueRegistry creates a new SimpleValueRegistry. The registry state is initialized by executing the provided | |
// functions in order against an empty registry. Any simple value without a registered analog will | |
// produce an UnacceptableDataItemError if encountered in the input while unmarshaling. |
decode.go
Outdated
// Creates a new SimpleValueRegistry. The registry state is initialized by executing the provided | ||
// functions in order against a registry that is pre-populated with the library defaults. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add function name to comment.
// Creates a new SimpleValueRegistry. The registry state is initialized by executing the provided | |
// functions in order against a registry that is pre-populated with the library defaults. | |
// NewSimpleValueRegistryFromDefaults creates a new SimpleValueRegistry. The registry state is initialized by executing the provided | |
// functions in order against a registry that is pre-populated with the library defaults. |
decode.go
Outdated
if sv >= 24 && sv <= 31 { | ||
return fmt.Errorf("cbor: cannot set analog for reserved simple value %d", sv) | ||
} | ||
r.analogs[sv] = &analog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If limited to only a few built-in types, it becomes hard for users to distinguish between a simple value and a different value with an equivalent Go representation. If the limitation is relaxed to be based on kind rather than type, then users could define a named type with one of the supported underlying types, and use an analog of that type to distinguish the simple value.
I agree. We can restrict based on kind rather than type.
However, I think that would eventually bite users who register analogs that they don't entirely control. If a perfect analog value is provided by a library, users may be forced to perform a second pass over their decoded object to convert from a user-defined sentinel value to the analog that they want to appear in the final object. Alternatively, a library-provided value may change during a dependency update in such a way that it no longer meets our restrictions on acceptable analog values.
Maybe we can also allow users to register Unmarshaler
to handle simple values, as supported for tags.
So user can:
- perform decoding operation via
UnmarshalCBOR
for registered simple value if any, or - decode simple value to registered analog
This would give user the flexibility to handle more complex analog (pointers and containers) and also let user assume the responsibility for handling edge cases related to that.
Thoughts?
I don't have a use case for this presently, but I lean toward leaving users responsible for recognizing that their
*int
analog is the pointer itself and not the referenced value. Can you imagine this causing thorny issues down the line within this library? Map key equivalence is the main area that comes to my mind.
My main concern is potential concurrency bugs introduced by users if we allow pointers and containers that can hold pointers. Time spent troubleshooting bugs in 3rd party apps or libraries takes time away from improving this codec.
Maybe we can forbid pointers and containers in this PR and open a separate PR to add support for registering Unmarshaler
for simple values, which would allow the user to implement support pointers, containers, etc.
I like that direction. If support is eventually added for user-provided simple value unmarshalers, and we also implement limited support for certain simple value analog types today, there will ultimately be several ways to configure the same behaviors. My use cases don't require user-provided analogs at all, only the ability to reject simple values other than true, false, and null. Would you accept a more limited option for today that allows users to configure only which simple values their protocol rejects, without touching how accepted simple values are decoded? Given the discussion on this PR, I don't want to be too hasty to commit to a decision on how users provide their own analogs, but I think a simple value allowlist option and new error type would be uncontroversial. |
@benluddy Yes, that sounds good to me! 👍 |
58cd838
to
c263dcd
Compare
c263dcd
to
8639f7c
Compare
Thanks for the great discussion on this feature @fxamacker! I updated the PR so that users can reject arbitrary simple values but cannot change how any simple value unmarshals to a Go value. The option's API surface has roughly the same shape (only smaller) and should allow for the registration of custom simple value unmarshalers in the future. Please take another look when you get the chance. |
Applications can use the decode option SimpleValues to override the default unmarshal behavior on a per-simple-value basis. The option is open to future extension, and currently supports the choice between the default backwards-compatible behavior (true as true, false as false, null as nil, and undefined as nil, and cbor.SimpleValue(N) for each currently-unregistered but well-formed simple value number N) and outright rejection with the new error type UnacceptableDataItemError. Signed-off-by: Ben Luddy <[email protected]>
8639f7c
to
28078a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benluddy Nice! 👍 Thanks for updating this PR and great discussions!
Description
Proof of concept implementation of #477.
PR Was Proposed and Welcomed in Currently Open Issue
Checklist (for code PR only, ignore for docs PR)
Last line of each commit message should be in this format:
Signed-off-by: Firstname Lastname [email protected]
(see next section).
Certify the Developer's Certificate of Origin 1.1
the Developer Certificate of Origin 1.1.