-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉Updated normalization to handle new datatypes #19721
🎉Updated normalization to handle new datatypes #19721
Conversation
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
❌ Destinations (16)
Connector | Version | Changelog | Publish |
---|---|---|---|
destination-bigquery |
1.2.9 |
✅ | ✅ |
destination-bigquery-denormalized |
1.2.10 |
✅ | ✅ |
destination-clickhouse |
0.2.1 |
✅ | ✅ |
destination-clickhouse-strict-encrypt |
0.2.1 |
🔵 (ignored) |
🔵 (ignored) |
destination-jdbc |
0.3.14 |
🔵 (ignored) |
🔵 (ignored) |
destination-mssql |
0.1.22 |
✅ | ✅ |
destination-mssql-strict-encrypt |
0.1.22 |
🔵 (ignored) |
🔵 (ignored) |
destination-mysql |
0.1.20 |
✅ | ✅ |
destination-mysql-strict-encrypt |
❌ 0.1.21 (mismatch: 0.1.20 ) |
🔵 (ignored) |
🔵 (ignored) |
destination-oracle |
0.1.19 |
✅ | ✅ |
destination-oracle-strict-encrypt |
0.1.19 |
🔵 (ignored) |
🔵 (ignored) |
destination-postgres |
0.3.26 |
✅ | ✅ |
destination-postgres-strict-encrypt |
0.3.26 |
🔵 (ignored) |
🔵 (ignored) |
destination-redshift |
0.3.53 |
✅ | ✅ |
destination-snowflake |
0.4.41 |
✅ | ✅ |
destination-tidb |
0.1.0 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
👀 Other Modules (1)
- base-normalization
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
@etsybaev Do we have some tests of newly added binary data? |
@@ -8,22 +8,22 @@ | |||
"$schema": "http://json-schema.org/draft-07/schema#", | |||
"properties": { | |||
"id": { | |||
"type": ["null", "integer"] | |||
"$ref" : "WellKnownTypes.json#/definitions/Integer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not we need to provide "null" anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we switched from "native" integer to this new airbyte well known "Integer" in all normalization tests?
Do we abandon using native json integer(s) completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- need to make sure normalization tests are passing before publishing and merging
- I think we should keep tests for old format until it is completely deprecated
return property_type == "string" or "string" in property_type | ||
if data_type.ONE_OF_VAR_NAME in property_type and data_type.ONE_OF_VAR_NAME in property_type: | ||
plain_datatypes_list = get_plain_list_from_one_of_array(property_type) | ||
return plain_datatypes_list == data_type.STRING_TYPE or data_type.STRING_TYPE in plain_datatypes_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration test seems to be complaining about this line
BUILD FAILED in 26s
[python] .venv/bin/python -m mypy normalization --config-file /actions-runner/_work/airbyte/airbyte/pyproject.toml
normalization/transform_catalog/utils.py: note: In function "is_string":
normalization/transform_catalog/utils.py:32:16: error: Non-overlapping equality
check (left operand type: "List[Any]", right operand type: "str")
[comparison-overlap]
return plain_datatypes_list == data_type.STRING_TYPE or data_t...
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @grishick
I believe that integration tests on CI fail because It generates thousands of files, but I haven't committed autogenerated files yet to make the review process easier. Obviously, I will have to make them pass before merging :)
Just to make sure, do you want me to make it works for both old and new protocols for a while? I may do that, but it would take me lots of additional time. The original ticket says: "This updated version of normalization will never see old-style catalogs, because it will be released after the platform protocol versioning migrations, and normalization is released as part of platform (i.e. it's impossible for someone to upgrade normalization without also upgrading platform).".
So I didn't spend time for keep supporting old-style also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test is complaining that the ==
will never return true, because the left side (plain_datatypes_list
) is a List
but the right side (data_type.STRING_TYPE
) is a String
@grishick is the reasoning for supporting both versions just as a fallback in case something weird happens? I don't hate that (e.g. for manual testing without needing to recompile repeatedly)
@etsybaev what's the obstacle to supporting both? I think the minimal lift is to change the is_xyz_type()
methods to look more like
def is_datetime_without_timezone(definition):
if "oneOf" in definition:
...
else if "$ref" in definition:
...
else:
# handle old-style schema
though maybe we should just refactor this into more of a get_type(definition) -> string
style, and have stream_processor.py do something like
type = get_type(definition)
if type == "object"
else if type == "string"
...
That feels more correct anyway (i.e. figure out which airbyte type we're looking at, and then handle it appropriately). And then get_type can have all the complicated logic around $ref
/oneOf
/type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my reasoning for supporting both versions was just as a fallback in case something weird happens. However, I think that's not necessary, so I am OK with not supporting the older version. If somehow an older platform ends up loading new normalization image it'll fail, but that's not a reasonable scenario given how we manage normalization version right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'll become more of a possibility, since normalization version is moving into destination_definitions.yaml. I don't think there's any safeguard against downgrading normalization to a pre-protocol-v1 version other than "PR reviewer is careful"
though maybe there should be such a safeguard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that there are also lots of other places where the old "type" were used, not only in cast_property_type method. To find all cases I had to do dozens run in debug mode for different connectors and check what actually comes here. Some of the methods also appear to be called in very unpredictable ways. So adding backward compatibility will take me some time (can't estimate it yet), as for now I will not know if the received "type" is an object or still should be treated as simple object. Also will need to dive deeper in tests and SQLs. As new goal will be not only make them pass, but also to extend to support versioning.
By the way, locally all tests pass, maybe something is not passed on Ci as I haven't committed autogenerated files to make the review process easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note from grooming session: we don't need to keep support for the old protocol version in normalization. The only breaking scenario is old platform + new normalization, which is not possible unless someone deliberately updates hard-coded normalization version w/o updating the platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a first pass at this. Definitely a complicated PR! I think the general approach makes sense; I just had some questions about specific implementation bits.
Also had a comment for a potential refactor - totally reasonable to not want to do that since this is already a big change, but I think it would simplify the logic overall.
return property_type == "string" or "string" in property_type | ||
if data_type.ONE_OF_VAR_NAME in property_type and data_type.ONE_OF_VAR_NAME in property_type: | ||
plain_datatypes_list = get_plain_list_from_one_of_array(property_type) | ||
return plain_datatypes_list == data_type.STRING_TYPE or data_type.STRING_TYPE in plain_datatypes_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test is complaining that the ==
will never return true, because the left side (plain_datatypes_list
) is a List
but the right side (data_type.STRING_TYPE
) is a String
@grishick is the reasoning for supporting both versions just as a fallback in case something weird happens? I don't hate that (e.g. for manual testing without needing to recompile repeatedly)
@etsybaev what's the obstacle to supporting both? I think the minimal lift is to change the is_xyz_type()
methods to look more like
def is_datetime_without_timezone(definition):
if "oneOf" in definition:
...
else if "$ref" in definition:
...
else:
# handle old-style schema
though maybe we should just refactor this into more of a get_type(definition) -> string
style, and have stream_processor.py do something like
type = get_type(definition)
if type == "object"
else if type == "string"
...
That feels more correct anyway (i.e. figure out which airbyte type we're looking at, and then handle it appropriately). And then get_type can have all the complicated logic around $ref
/oneOf
/type
one_of_property = one_of_definition[data_type.ONE_OF_VAR_NAME] | ||
plain_datatypes_list = list() | ||
for prop in one_of_property: | ||
plain_datatypes_list.append(prop[data_type.REF_TYPE_VAR_NAME]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do if the oneOf has a non-ref entry? e.g.
{
"oneOf": [
{"$ref": "....String"},
{"type": "object", "properties": {...}}
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao In this case, it will take a String type by already existing logic. If the string is in the list, then other types would be ignored. As far as I see from Normalization logic - it always takes a wider type for SQL column-> string
|
||
def is_binary_datatype(property_type) -> bool: | ||
if data_type.ONE_OF_VAR_NAME in property_type and data_type.ONE_OF_VAR_NAME in property_type: | ||
plain_datatypes_list = get_plain_list_from_one_of_array(property_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the data_type.ONE_OF_VAR_NAME in property_type
condition got duplicated, and I think there's a way to avoid relying on the get_plain_list_from_one_of_array
method entirely:
if data_type.ONE_OF_VAR_NAME in property_type:
return any(option["$ref"] == "...BinaryData" for option in property_type["oneOf"])
(copied the syntax from https://stackoverflow.com/a/22971758)
This makes the assumption that {"oneOf": "...BinaryData"}
will never happen. I think that's a valid assumption, because oneOf
should always be a list of schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
plain_datatypes_list = get_plain_list_from_one_of_array(property_type) | ||
return plain_datatypes_list == data_type.BINARY_DATA_TYPE or data_type.BINARY_DATA_TYPE in plain_datatypes_list | ||
else: | ||
return property_type == data_type.BINARY_DATA_TYPE or data_type.BINARY_DATA_TYPE in property_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand this - isn't property_type
a dict (i.e. Map
)? When would it ever be equal to data_type.BINARY_DATA_TYPE
(String
) or contain data_type.BINARY_DATA_TYPE
as a key?
i.e. should this look more like return property_type["$ref"] == data_type.BINARY_DATA_TYPE
?
(all I want for Christmas is static typing 😢)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a generic verification to check for object. As depends on the place where this is called, It may be a single object, it may be a dict, or it may be inside of oneOf. But if we have an assumption that binary datatype could not be a part of oneOf (as per your comment above) - then yes, this may be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao This is what I mean by saying that some of the methods may be called from different places and sometimes passing a simple object and sometimes a "dict". This was also mindblowing for me after strict types in java :)
Here is an example where it's passed like just a string. That is why some checks may look redundant at the first glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmmmmmmmmmmmmmm
how hard would it be to make all of these methods always expect a dict? It feels like this is causing us a lot of confusion :/ and this seems like a good opportunity to clean up the code a little bit
or is that what you did in the most recent set of commits?
return property_type == data_type.STRING_TYPE or data_type.STRING_TYPE in property_type | ||
|
||
|
||
def is_binary_datatype(property_type) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a type hint for property_type
, assuming I've understood correctly
def is_binary_datatype(property_type) -> bool: | |
def is_binary_datatype(property_type: dict) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -115,4 +168,9 @@ def is_simple_property(definition: dict) -> bool: | |||
|
|||
|
|||
def is_combining_node(properties: dict) -> Set[str]: | |||
return set(properties).intersection({"anyOf", "oneOf", "allOf"}) | |||
# this case appears when we have analog of old protocol like id: {type:[number, string]} and it's handled separately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically there could be cases where this isn't true: (more examples in https://github.com/airbytehq/airbyte/pull/19909/files?diff=unified&w=1#diff-6829fbf8db73883ede3e47d869e6f5756a882c2d94d74c13e0b2128b07666216R1038-R1050 )
{
"oneOf": [
{"$ref": ...},
{"type": "object", "properties": {"foo": {...}}},
{"type": "object", "properties": {"bar": {...}}}
]
}
but it doesn't really matter - this method doesn't actually control anything right now (it only gets called in stream_processor.py using if is_combinig_node: pass
)
json_extract = jinja_call(f"json_extract('{table_alias}', {json_column_name}, {json_path}, {normalized_json_path})") | ||
elif data_type.REF_TYPE_VAR_NAME in definition and (is_date(definition) or is_time(definition) or is_datetime(definition)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this branch need to be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were there before my changes. I just adopted it to handle new ref case
if self.destination_type.value == DestinationType.POSTGRES.value: | ||
# sql_type = "bytea" | ||
sql_type = jinja_call("type_binary()") | ||
return f"cast(decode({column_name}, 'base64') as {sql_type}) as {column_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth defining a decode_base64
macro, to move the destination-specific logic out of this file. Then if we make the type_binary()
macro return dbt_utils.type_string()
for destinations without a binary type, then we don't need so much destination-specific logic here.
I.e. we could do
decode_call = jinja_call(f"decode_base64({column_name})")
return f"cast({decode_call} as {sql_type}) as {column_name}"
for all destination types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about macros, but all DBs use their own approaches and nothing can be re-used. Do we still want to move it and get more code? Also, macroses sometimes stopped considering variables as variables, but like values in complicated queries. Thanks
...te-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py
Show resolved
Hide resolved
...te-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py
Outdated
Show resolved
Hide resolved
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
Build FailedTest summary info:
|
/test connector=bases/base-normalization
Build FailedTest summary info:
|
/test connector=bases/base-normalization
Build FailedTest summary info:
|
/test connector=bases/base-normalization
Build PassedTest summary info:
|
LGTM, although I am confused about the discussion around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more comments around code structure - feel free to say I'm increasing the scope too much :)
Can you commit the generated files for airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/postgres
? That should be a representative sample for what's actually changing.
return ( | ||
plain_datatypes_list == data_type.TIMESTAMP_WITHOUT_TIMEZONE_TYPE | ||
or data_type.TIMESTAMP_WITHOUT_TIMEZONE_TYPE in plain_datatypes_list | ||
if data_type.ONE_OF_VAR_NAME in definition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should all the is_whatever_type
methods have this oneof handling?
maybe worth doing something like this, to reduce the amount of copypasta (I didn't test this at all, hopefully it's clear enough. Also maybe would be more readable using lambda
syntax in get_reftype_function
, idk)
def is_type_included(definition: dict, is_type: Callable[[dict], bool]) -> bool:
if data_type.ONE_OF_VAR_NAME in definition:
return bool(
any(
is_type(option)
for option in definition[data_type.ONE_OF_VAR_NAME]
)
)
else:
return is_type(definition)
def get_reftype_function(type: string) -> Callable[[dict], bool]:
def is_reftype(definition: dict) -> bool:
return data_type.REF_TYPE_VAR_NAME in definition and type == definition[data_type.REF_TYPE_VAR_NAME]
return is_reftype
def is_binary_datatype(definition: dict) -> bool:
return is_type_included(definition, get_reftype_function(data_type.BINARY_DATA_TYPE))
def is_datetime_with_timezone(definition: dict) -> bool:
return is_type_included(definition, get_reftype_function(data_type. TIMESTAMP_WITHOUT_TIMEZONE_TYPE))
# etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen these methods are also called in other parts of code. So even a simple change may cause unexpected side effects and I've seen it many times during my work on this PR. So since we may have one not only in ref types I really afraid to touch it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code change is transparent though? I.e. it keeps the same method signature, just changes the implementation to be consistent across methods. So other than the methods that take both definition
and property_type
, it shouldn't matter?
mostly just want to doublecheck whether the oneOf
handling is intentionally excluded from some methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao Ohh...got your point about Callable. Updated. Also updated methods to get rid of unused property_types. Thanks
airbyte-integrations/bases/base-normalization/normalization/transform_catalog/utils.py
Outdated
Show resolved
Hide resolved
/test connector=bases/base-normalization
Build PassedTest summary info:
|
kube acceptance tests are failing b/c flaky; helm acceptance tests are passing. merging this in and if the kube tests continue failing we can deal with it in the superbranch. |
* Add Airbyte Protocol V1 support. * Fix VersionedAirbyteStreamFactoryTest * Remove AirbyteMessageMigrationV0 example * Add Protocol Version constants * 🎉Updated normalization to handle new datatypes (#19721) * Updated normalization simple stream processing to handle new datatypes * Updated normalization nested stream processing to handle new datatypes * Updated normalization nested stream processing to handle new datatypes * Updated normalization drop_scd_catalog processing to handle new datatypes * Updated normalization ephemeral test processing to handle new datatypes * fixed more tests for normalization * fixed more tests for normalization * fixed more tests for normalization * fixed more tests for normalization * fixed more issues * fixed more issues (clickhouse) * fixed more issues * fixed more issues * fixed more issues * added binary type processing for some DBs * cleared commented code and moved some hardcodes to processing as macro * fixed codestyle and cleared commented code * minor refactor * minor refactor * minor refactor * fixed bool cast error * fixed dict->str cast error * fixed is_combining_node cast py check * removed commented code * removed commented code * committed autogenerated normalization_test_output files * committed autogenerated normalization_test_output files (new files) * refactored utils.py * Updated utils.py to use Callable functions and get rid of property_type in is_number and is_bool functions * committed autogenerated normalization_test_output files (new files) * fixed typo in TIMESTAMP_WITH_TIMEZONE_TYPE * updated stream_processor to handle string type first as a wider type * fixed arrays normalization by updating is_simple_property method as per new approaches * format Co-authored-by: Edward Gao <[email protected]> * Update airbyte protocol migration (#20745) * Extract MigrationContainer from AirbyteMessageMigrator * Add ConfiguredAirbyteCatalogMigrations * Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations * Enable ConfiguredAirbyteCatalog migration * Fix tests * Remove extra this. * Add missing docs * Typo Co-authored-by: Edward Gao <[email protected]> * Data types update: Implement protocol message migrations (#19240) * Extract MigrationContainer from AirbyteMessageMigrator * Add ConfiguredAirbyteCatalogMigrations * Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations * Enable ConfiguredAirbyteCatalog migration * set up scaffolding * [wip] more scaffolding, basic unit test * minimal green code * [wip] add failing test for other primitive types * correct version number * handle basic primitive type decls * add implicit cases * add recursive schema * formatting * comment * support not * fix indentation * handle all nested schema cases * handle boolean schemas * verify empty schema handling * cleanup * extract map * code organization * extract method * reformat * [wip] more tests, minor fix type array handling * corrected test * cleanup * reformat * switch to v1 * add support for multityped fields * missed test case * nested test class * basic record upgrade * implement record upgrades * slight refactor * comments+clarificationso * extract constants * (partly) correct model classes * add de/ser * formatting * extract constants * fix json reference * update docs * switch to v1 models * fix compile+test * add base64 handling * use vnull * Data types update: Implement protocol message downgrade path (#19909) * rough skeleton for passing catalog into migration * basic test * more scaffolding * basic implementation * add primitives test * add in other tests (nested fields currently failing) * add formats * impleent oneOf handling * formatting * oneOf handling * better tests * comments + organization * progress * basic test case * downgrade objects, ish * basic array implementation * handle numeric failure * test for new type * handle array items * empty schema handling * first pass at oneof handling * add more tests+handling * more tests * comments * add empty oneof test case * format + reorganize * more reorganize * fix name * also downgrade binary data * only import vnull * move migrations into v1 package * extract schema mutation code * comment * extract schema migration to new class * extract record downgrade logic for future use * format * fix build after rebase * rename private method for consistency * also implement configuredcatalog migrations >.> * quick and dirty tests * slight cleanup * fix tests * pmd * pmd test * null check on message objects * maybe fix acceptance tests? * fix name * extract constants * more fixes * tmp * meh * fix cdc acc tests * revert to master source-postgres * remove log messages * revert other misc hacks * integers are valid cursors * remove unrelated change * fix build * fix build more? * [MUST REVERT] use dev normalization * capture kube logs * also here? * no debug logs? * delete dup from merging * add final everywhere * revert test changes Co-authored-by: Jimmy Ma <[email protected]> * On-the-fly migrations of persisted catalogs (#21757) * On the fly catalog migration for normalization activity * On the fly catalog migration for job persistence * On the fly migration for standard sync persistence * On the fly migration for airbyte catalogs * Refactor code to share JsonSchema traversal * Add V0 Data type search function * PMD and Format * Fix getOrInsertActorCatalog and ConfigRepositoryE2E tests * Null-proofing CatalogMigrationV1Helper * More null checks * Fix test * Format * Add data type v1 support to the FE * Changes AC test check to check exited ps (#21672) some docker compose changes no longer show exited processes. this broke out test this change should fix master tested in a runner that failed * Move wellknown types mapping to the utility function * use protocolv1 normalization --------- Co-authored-by: Topher Lubaway <[email protected]> Co-authored-by: Edward Gao <[email protected]> * Update protocol support range (#21996) * bump normalization version to 0.3.0 * Add version check on normalization (#22048) * Add normalization min version check * Add visible for testing --------- Co-authored-by: Edward Gao <[email protected]> Co-authored-by: Eugene <[email protected]> Co-authored-by: Topher Lubaway <[email protected]>
* Revert "Normalization: handle non-object top-level schemas; treat binary data as string (#22165)" This reverts commit 8276d03. * Revert "Normalization: check for ref type existence (#22161)" This reverts commit dbe56d6. * Revert "🎉Updated normalization to handle new datatypes (#19721)" This reverts commit c1d7736. * revert dest definitions * also dockerfile * re-add to changelog * add comment in dockerfile
What
Airbyte introduces a new schema with centralized "wellKnownDatatypes" (https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/protocol-models/src/main/resources/airbyte_protocol/well_known_types.yaml).
Note: this cannot be merged until #19240 is merged
How
Recommended reading order
🚨 User Impact 🚨
Breaking changes! Need to merge only after logic for schema upgrade once connectors would be able to work with new datatypes
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.