Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: preserve the rest of a struct when one field has a processing error #7373

Merged
merged 5 commits into from
Apr 21, 2021

Conversation

jzaralim
Copy link
Contributor

Description

Fixes #5980. This also applies to items in arrays and keys/values in maps.

Testing done

Added QTT tests

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@jzaralim jzaralim requested a review from a team as a code owner April 12, 2021 23:43
@stevenpyzhang
Copy link
Member

I think this should be considered a breaking change. Users would expect to lose the entire struct when a field has an error to null, but now it's being kept.

If you ran some CSAS/CTAS on a server without this change, process some records that cause a field processing error, stopped server, "upgraded" server to one with this change, and then processed the same records that ran into field errors, would the query still output null, or would we get this new behavior?

@jzaralim could you check what the behavior is here?

@jzaralim
Copy link
Contributor Author

@stevenpyzhang Tested it out, the old records remained as null, and new records follow the new behavior. It doesn't change the old records, but it's still a change in behaviour so I'm not sure if this is a breaking change or not.

|mmmmmmm              |null                                                                     |
|new record           |{NUM=44, AA=null}                                                        |

@stevenpyzhang
Copy link
Member

stevenpyzhang commented Apr 16, 2021

Sorry, breaking change isn't the right term here (the query didn't break, it ran but with different behavior), I meant this change is backwards incompatible since the behavior changed. Query behavior change is something we should try to avoid as users may have some logic expecting this "bugged behavior". Ideally we keep the old behavior for now and trigger the new behavior with a config, see #5769 for an example of this. We'd eventually completely get rid of the old behavior and config when we reach some major release.

Copy link
Member

@stevenpyzhang stevenpyzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, can you verify one more time in the example above that queries issued on servers before this fix keep the old behavior of dropping the entire record even after upgrading to a server with the fix?

@jzaralim
Copy link
Contributor Author

Sure, here's an example:

-- on ksqlDB 6.2
SET 'auto.offset.reset'='earliest';
CREATE STREAM S (BDAY STRING) WITH (KAFKA_TOPIC='S', KEY_FORMAT='KAFKA', PARTITIONS=2, REPLICAS=1, VALUE_FORMAT='JSON');
CREATE STREAM OLD AS SELECT  BDAY, STRUCT(NUM:=44, B:=PARSE_TIMESTAMP(BDAY, 'yyyy-MM-dd')) FROM S;
INSERT INTO S VALUES ('abc');

-- current branch
SET 'auto.offset.reset'='earliest';
CREATE STREAM NEW AS SELECT  BDAY, STRUCT(NUM:=44, B:=PARSE_TIMESTAMP(BDAY, 'yyyy-MM-dd')) FROM S S;
INSERT INTO S VALUES ('def');

SELECT * FROM OLD EMIT CHANGES;
+-------------------------------------------------------+----------------------------------------------+
|BDAY                                                   |KSQL_COL_0                                    |
+-------------------------------------------------------+----------------------------------------------+
|abc                                                    |null                                          |
|def                                                    |null                                          |

SELECT * FROM NEW EMIT CHANGES;
+-------------------------------------------------------+----------------------------------------------+
|BDAY                                                   |KSQL_COL_0                                    |
+-------------------------------------------------------+----------------------------------------------+
|abc                                                    |{NUM=44, B=null}                              |
|def                                                    |{NUM=44, B=null}                              |

@jzaralim jzaralim merged commit 6d708db into confluentinc:master Apr 21, 2021
@jzaralim jzaralim deleted the 5980 branch April 21, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When processing error occurs on nested field in a STRUCT, only set that field to null instead of entire STRUCT
2 participants