You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The prior behavior of the connector was to basically send a single record's schema to BigQuery and let validation happen there; the only permitted operations were (and still are) adding new columns to a table, and relaxing existing columns from REQUIRED to NULLABLE. This meant that it was possible to relax required fields to nullable, but only if there was a corresponding upstream schema change.
The new behavior of the connector still catches this case, but also automatically relaxes REQUIRED fields in the existing table schema to NULLABLE if they're missing from the most recent upstream schema.
This is risky, since it means that a single misplaced record with a completely disjoint schema from the existing table schema can cause permanent modifications to be made to the BigQuery table schema. Granted, this would require allowNewBQFields and allowBQRequiredFieldRelaxation to both be set to true, but it's not unreasonable for people to want to enable both with the expectation that they would cause the connector to act in the same way as it would have with autoUpdateSchemas.
I think we might still want to add a third config property, allowSchemaUnionization, that toggles the schema unionization behavior. If it's set to false and both allowNewBQFields and allowBQRequiredFieldRelaxation are set to true, then the prior behavior of the autoUpdateSchemas property should be preserved effectively for users who still want that.
The text was updated successfully, but these errors were encountered:
* GH-27: Add allowSchemaUnionization config property
Still needed: unit and possibly integration tests for the logic in the
SchemaManager class
* GH-27: Tweak schema change validation logic
* GH-27: Fix schema update bugs, add unit tests
(Copied from wepay#291 (comment))
The prior behavior of the connector was to basically send a single record's schema to BigQuery and let validation happen there; the only permitted operations were (and still are) adding new columns to a table, and relaxing existing columns from
REQUIRED
toNULLABLE
. This meant that it was possible to relax required fields to nullable, but only if there was a corresponding upstream schema change.The new behavior of the connector still catches this case, but also automatically relaxes
REQUIRED
fields in the existing table schema toNULLABLE
if they're missing from the most recent upstream schema.This is risky, since it means that a single misplaced record with a completely disjoint schema from the existing table schema can cause permanent modifications to be made to the BigQuery table schema. Granted, this would require
allowNewBQFields
andallowBQRequiredFieldRelaxation
to both be set totrue
, but it's not unreasonable for people to want to enable both with the expectation that they would cause the connector to act in the same way as it would have withautoUpdateSchemas
.I think we might still want to add a third config property,
allowSchemaUnionization
, that toggles the schema unionization behavior. If it's set tofalse
and bothallowNewBQFields
andallowBQRequiredFieldRelaxation
are set totrue
, then the prior behavior of theautoUpdateSchemas
property should be preserved effectively for users who still want that.The text was updated successfully, but these errors were encountered: