-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add json_records, to_json_string, and json_concat UDFs #8632
Conversation
b5243c2
to
82d9fc4
Compare
1e50b6a
to
1a1a182
Compare
Since: 0.25.0 | ||
|
||
```sql | ||
json_records(json_string) -> Array<Struct<json_key:String, json_value:String>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have already be covered earlier, but could this return a Map<Key,Key> instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion about return types is in the KLIP - #8550 (comment). If you prefer, we can discuss it there with a wider audience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice; after I started reviewing, I realized that there was a KLIP for this. (And that I'm late to the party; thanks for the link!)
I'm not sure where the discussion should be between the two places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are definitely not too late! We can continue the discussion in the KLIP or here. In my view, a Struct
is a strictly better type than a Map<String, String>
in this specific case because the former provides more comprehensive information and leaves less room for errors (and therefore error checking.)
For example, consider a user program that receives Map<String, String>
from a function. A careful programmer has to check the size of the map and that it contains promised key-value pairs. On the other hand, the struct type is self-documenting - there will be only 2 keys of the given type. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I think that Map<String,String>
and Array<Struct<String,String>>
are equivalent types in some sense. If I have a map, I ought to be able to iterate over it like an Array, and if given an Array, one will invariably want to do something that looks like a map.get()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that your argument is about using Map<String,String>
as the return type rather than a replacement for the Struct
in the array, i.e. Array<Map<String, String>>
. Sorry about that!
I agree that Map<String, String>
has as much type information as Array<Struct<String,String>>
, so it could be used instead. I really don't have any strong arguments one way or another. As a kinda weak argument, its neighbour function, json_keys
, returns an array, so it is probably more consistent to return an array here as well assuming everything else is equal.
I don't have a strong opinion here, so if you think that Map<String, String
is a better return type, let's discuss it with a wider audience in this thread in the KLIP.
json_records(json_string) -> Array<Struct<json_key:String, json_value:String>> | ||
``` | ||
|
||
Given a string, parses it as a JSON object and returns a ksqlDB array of structs containing 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this may be likely out-of-scope for this ticket, but I'm wondering if there are future functions which would produce JSON which should be inverse of this.
E.g., I think there's interest in struct->json functions once other issues are resolved. Have we thought through the consistency issues there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAST
is arguably the inverse of this. It does work for the primitive types. Containers are not yet supported, unfortunately:
ksql/ksqldb-execution/src/main/java/io/confluent/ksql/execution/codegen/helpers/CastEvaluator.java
Lines 88 to 96 in f7c1334
// STRING: | |
.put(key(STRING, BOOLEAN), nonNullSafeCode("SqlBooleans.parseBoolean(%s.trim())")) | |
.put(key(STRING, INTEGER), nonNullSafeCode("Integer.parseInt(%s.trim())")) | |
.put(key(STRING, BIGINT), nonNullSafeCode("Long.parseLong(%s.trim())")) | |
.put(key(STRING, DECIMAL), CastEvaluator::castToDecimal) | |
.put(key(STRING, DOUBLE), nonNullSafeCode("SqlDoubles.parseDouble(%s.trim())")) | |
.put(key(STRING, TIMESTAMP), nonNullSafeCode("SqlTimeTypes.parseTimestamp(%s.trim())")) | |
.put(key(STRING, TIME), nonNullSafeCode("SqlTimeTypes.parseTime(%s.trim())")) | |
.put(key(STRING, DATE), nonNullSafeCode("SqlTimeTypes.parseDate(%s.trim())")) |
Here is a test I tried to see how this will work out:
ksql> CREATE STREAM test (k STRING KEY, v INT) WITH (kafka_topic='test_topic', value_format='JSON');
ksql> CREATE STREAM test (k STRING KEY, v INT) WITH (kafka_topic='test_topic', value_format='JSON');
Message
----------------
Stream created
----------------
ksql> CREATE STREAM to_json AS SELECT k, to_json_STRING(v) AS JSON_v FROM test;
Message
---------------------------------------
Created query with ID CSAS_TO_JSON_17
---------------------------------------
ksql> insert into test (k, v) VALUES ('a', 1);
ksql> select * from to_json;
+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
|K |JSON_V |
+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
|a |[1,2,3] |
|a |1 |
NOTE: the [1,2,3]
is a leftover from a previous test that I didn't cut out to show how does an unsupported cast look.
ksql> select json_v, cast(json_v as int) as back_to_int from to_json;
+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
|JSON_V |BACK_TO_INT |
+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
|[1,2,3] |null |
|1 |1 |
ksqldb-engine/src/main/java/io/confluent/ksql/function/udf/json/IsJsonString.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/function/udf/json/JsonRecords.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/function/udf/json/ToJsonString.java
Show resolved
Hide resolved
ksqldb-engine/src/test/java/io/confluent/ksql/function/udf/json/ToJsonStringTest.java
Show resolved
Hide resolved
Ok, it looks like Is it worth verifying that? Or seeing if there are failures doing that? |
094382b
to
eeb1066
Compare
@jnh5y @JimGalasyn this PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
|
||
Given N strings, parse them as JSON values and return a string representing their concatenation. | ||
Concatenation rules are identical to PostgreSQL's [|| operator](https://www.postgresql.org/docs/14/functions-json.html): | ||
* If all strings deserialize into JSON objects, then return an object with a union of the input key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If all strings deserialize into JSON objects, then return an object with a union of the input key, | |
* If all strings deserialize into JSON objects, return an object with a union of the input key. |
Given N strings, parse them as JSON values and return a string representing their concatenation. | ||
Concatenation rules are identical to PostgreSQL's [|| operator](https://www.postgresql.org/docs/14/functions-json.html): | ||
* If all strings deserialize into JSON objects, then return an object with a union of the input key, | ||
taking values from the last object in the case of duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking values from the last object in the case of duplicates. | |
If there are duplicate objects, take values from the last object. |
Concatenation rules are identical to PostgreSQL's [|| operator](https://www.postgresql.org/docs/14/functions-json.html): | ||
* If all strings deserialize into JSON objects, then return an object with a union of the input key, | ||
taking values from the last object in the case of duplicates. | ||
* If all strings deserialize into JSON arrays, then return the result of array concatenation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If all strings deserialize into JSON arrays, then return the result of array concatenation. | |
* If all strings deserialize into JSON arrays, return the result of array concatenation. |
* If all strings deserialize into JSON objects, then return an object with a union of the input key, | ||
taking values from the last object in the case of duplicates. | ||
* If all strings deserialize into JSON arrays, then return the result of array concatenation. | ||
* If at least one of the deserialized values is not an object, then convert non-array inputs to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If at least one of the deserialized values is not an object, then convert non-array inputs to a | |
* If at least one of the deserialized values is not an object, convert non-array inputs to a |
* If all strings deserialize into JSON arrays, then return the result of array concatenation. | ||
* If at least one of the deserialized values is not an object, then convert non-array inputs to a | ||
single-element array and return the result of array concatenation. | ||
* If at least one of the input strings is `NULL` or can't be deserialized as JSON, then return `NULL`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If at least one of the input strings is `NULL` or can't be deserialized as JSON, then return `NULL`. | |
* If at least one of the input strings is `NULL` or can't be deserialized as JSON, return `NULL`. |
single-element array and return the result of array concatenation. | ||
* If at least one of the input strings is `NULL` or can't be deserialized as JSON, then return `NULL`. | ||
|
||
Akin to PostgreSQL's `||` operator, this function merges only top-level object keys or arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Akin to PostgreSQL's `||` operator, this function merges only top-level object keys or arrays. | |
Similar to the PostgreSQL `||` operator, this function merges only top-level object keys or arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a few suggestions.
f67cf49
to
138c4ef
Compare
138c4ef
to
156ae20
Compare
Test failure is unrelated and should be fixed by #8664. |
Description
This PR introduces 2 new UDFs -
json_records
andto_json_string:
:json_records
returns an array of structs with JSON keys and values. See the corresponding section in KLIP-59 for more details.to_json_string
serializes given argument into a JSON string. See the corresponding section in KLIP-59 for more details.json_concat
concatenates given JSON strings. See the corresponding section in KLIP-59 for more details.This PR is rebased on top of #8600 to get
UdfJsonMapper#parseJson
- this is the reason why the change log containsis_json_string
and friends.Testing done
Added unit and functional tests.
Reviewer checklist