From 18d8be72c65955b1d72153d2d400d77bef69a6e1 Mon Sep 17 00:00:00 2001 From: Grzegorz Piwowarek Date: Thu, 11 Jul 2024 12:15:40 +0200 Subject: [PATCH] Add ADR explaining why '::' was chosen as nested separator --- ...ble_colon_as_nested_structure_separator.md | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 adr/4_double_colon_as_nested_structure_separator.md diff --git a/adr/4_double_colon_as_nested_structure_separator.md b/adr/4_double_colon_as_nested_structure_separator.md new file mode 100644 index 000000000..0738fe191 --- /dev/null +++ b/adr/4_double_colon_as_nested_structure_separator.md @@ -0,0 +1,63 @@ +# Use Double Colon (`::`) as a Separator for Nested Fields Representation in Clickhouse + +## Context and Problem Statement + +When ingesting JSON documents, representing nested objects in ClickHouse requires special handling - nested objects get flattened into columns. +To distinguish between nested fields, a separator is needed. The most natural choice would be a dot (`.`), but ClickHouse treats it as a special character used for dealing with `Nested` columns, which can lead to parsing errors or unexpected behavior even when properly escaped. + +For example, consider the following `CREATE TABLE`: +```sql +CREATE TABLE test ( + `event.type` Array(String), + `event.name` Array(String), +) +``` + +When ingesting JSON documents with the following structure: +```json +{ + "event": { + "type": ["A", "B"], + "name": ["X"] + } +} +``` + +ClickHouse will assume that `event` is a nested object and expect arrays of the same size to ensure correct representation. As much as this sounds useful, handling empty or optional values becomes cumbersome. + +Another example: +> Dot in column names has a special/reserved meaning. CH expects that columns with dot is Arrays. +> Do not use dot. + +source: [link](https://github.com/ClickHouse/ClickHouse/issues/18765#issuecomment-754661913) + + +### Considered Options + +1. **Using Dot (`.`) as a Separator** + - Pros: + - A natural and obvious solution. + - Familiar to users. + - Cons: + - Doesn't work. +2. **Using Double Colon (`::`) as a Separator** + - Pros: + - Avoids conflict with ClickHouse's treatment of `.`. + - Clear and unambiguous representation of nested fields. + - Simplifies query generation and ensures correct parsing by ClickHouse. + - Rarely used in field names, reducing the risk of conflicts. + - Cons: + - Less common and may require initial adaptation by users. + - Requires extra effort to expose those to users as dots. + +### Decision Outcome and Drivers + +**Chosen Option:** Using Double Colon (`::`) as a Separator because: +- **Avoids conflicts:** By using `::`, we avoid the special treatment of `.` in ClickHouse, ensuring that nested fields are parsed and interpreted correctly without additional processing. +- **No better alternatives exist:** Every separator that is not `.` will have more/less same cons as `::` and there's nothing we can do about it for now. + +We rejected the option of using `.` because it didn't work. + +## People + +- @pivovarit