-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Case Sensitivity needs to be addressed #3666
Comments
That's not going to float ;)
Surely, this falls under the remit of the quoted identifier work? As part of that it's probably worth checking that if a user supplies a quoted column name that we pick the right one.
This is not really an option. We don't know what the data looks like when the user is providing us with the schema. What's more, the data can evolve and change over time. So maybe the first record we see only has Also, while I'm a fan of 'fail early and fail hard' for many things, I don't think throwing an error here is the right option. If an upstream system suddenly starts publishing a second case-variant of one of the fields you're using, you'll be mighty pissed if KSQL suddenly starts rejecting all the records. I don't think there is a perfect answer here. But I'd probably think about going with the approach that:
Does our recent work on quoted identifiers get us anywhere close to these two points? |
Also.. doesn't the same issue exist in Avro or does Avro have more strict naming rules? Regardless, we should consider fixing this in a more generic way as other formats may encounter the same issues. |
I've got a test for this and will publish a PR that shows it!
The first point (if the user supplies a quoted column name, then only that case should match) is true except if you supply a quoted column name that is all uppercase. I'm not sure there is a good way to solve that problem efficiently.
You might also be pissed of KSQL started picking that one up and didn't tell you 😉
I think you are right there. For some reason i assumed Avro wasn't case sensitive but it looks like it is. |
Currently, this is the behavior when deserializing JSON in
KsqlJsonDeserializer
:foo VARCHAR
value{"foo": "bar"}
then it maps fieldfoo
to"bar"
.foo VARCHAR
and value{"foo": "bar", "fOO": "baz"}
) then the behavior is undefined - it will choose the latter of the twoI think this behavior is probably OK. It's unfortunate that JSON allows case sensitive field names because we have two options:
I prefer the third because it makes the simple things simple and the complicated things possible, but it's backwards incompatible technically - I think it might be OK because we'd be changing a super buggy behavior. I'll create a separate PR for the test and change.
Originally posted by @agavra in #3588
The text was updated successfully, but these errors were encountered: