Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉Updated normalization to handle new datatypes #19721

Conversation

etsybaev
Copy link
Contributor

@etsybaev etsybaev commented Nov 22, 2022

What

Airbyte introduces a new schema with centralized "wellKnownDatatypes" (https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/protocol-models/src/main/resources/airbyte_protocol/well_known_types.yaml).

Note: this cannot be merged until #19240 is merged

How

  1. Updated normalization to handle new datatypes
  2. Added processing for newly introduced Binary datatype to save it as binary object. Only Oracle and Redshift handle it, but save as a base64 string

Recommended reading order

  1. All python files
  2. All .json and .txt files (updated schemas and test messages for unit and integration tests)
  3. dbt-project-template/macros/cross_db_utils/datatypes.sql
  4. other ".sql" and ".yaml" files

🚨 User Impact 🚨

Breaking changes! Need to merge only after logic for schema upgrade once connectors would be able to work with new datatypes

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 22, 2022

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

❌ Destinations (16)

Connector Version Changelog Publish
destination-bigquery 1.2.9
destination-bigquery-denormalized 1.2.10
destination-clickhouse 0.2.1
destination-clickhouse-strict-encrypt 0.2.1 🔵
(ignored)
🔵
(ignored)
destination-jdbc 0.3.14 🔵
(ignored)
🔵
(ignored)
destination-mssql 0.1.22
destination-mssql-strict-encrypt 0.1.22 🔵
(ignored)
🔵
(ignored)
destination-mysql 0.1.20
destination-mysql-strict-encrypt 0.1.21
(mismatch: 0.1.20)
🔵
(ignored)
🔵
(ignored)
destination-oracle 0.1.19
destination-oracle-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
destination-postgres 0.3.26
destination-postgres-strict-encrypt 0.3.26 🔵
(ignored)
🔵
(ignored)
destination-redshift 0.3.53
destination-snowflake 0.4.41
destination-tidb 0.1.0
  • See "Actionable Items" below for how to resolve warnings and errors.

👀 Other Modules (1)

  • base-normalization

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@etsybaev
Copy link
Contributor Author

etsybaev commented Dec 4, 2022

Locally all integration and unit tests have passed
Selection_130

@etsybaev etsybaev marked this pull request as ready for review December 5, 2022 12:29
@etsybaev etsybaev requested a review from a team as a code owner December 5, 2022 12:29
@etsybaev etsybaev linked an issue Dec 5, 2022 that may be closed by this pull request
@etsybaev etsybaev changed the title Updated normalization simple stream processing to handle new datatypes Updated normalization to handle new datatypes Dec 5, 2022
@etsybaev etsybaev changed the title Updated normalization to handle new datatypes 🎉Updated normalization to handle new datatypes Dec 5, 2022
@grubberr
Copy link
Contributor

grubberr commented Dec 5, 2022

@etsybaev Do we have some tests of newly added binary data?

@@ -8,22 +8,22 @@
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"id": {
"type": ["null", "integer"]
"$ref" : "WellKnownTypes.json#/definitions/Integer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not we need to provide "null" anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we switched from "native" integer to this new airbyte well known "Integer" in all normalization tests?
Do we abandon using native json integer(s) completely?

Copy link
Contributor Author

@etsybaev etsybaev Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grubberr the answer is yes for both questions. Please check out this ticket's description and useful links from its description #17642
BTW, I've changed logic in stream_processor.py and utils.py also, not only tests

Copy link
Contributor

@grishick grishick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. need to make sure normalization tests are passing before publishing and merging
  2. I think we should keep tests for old format until it is completely deprecated

return property_type == "string" or "string" in property_type
if data_type.ONE_OF_VAR_NAME in property_type and data_type.ONE_OF_VAR_NAME in property_type:
plain_datatypes_list = get_plain_list_from_one_of_array(property_type)
return plain_datatypes_list == data_type.STRING_TYPE or data_type.STRING_TYPE in plain_datatypes_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test seems to be complaining about this line

BUILD FAILED in 26s
[python] .venv/bin/python -m mypy normalization --config-file /actions-runner/_work/airbyte/airbyte/pyproject.toml
	 normalization/transform_catalog/utils.py: note: In function "is_string":
	 normalization/transform_catalog/utils.py:32:16: error: Non-overlapping equality
	 check (left operand type: "List[Any]", right operand type: "str") 
	 [comparison-overlap]
	             return plain_datatypes_list == data_type.STRING_TYPE or data_t...
	                    ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @grishick
I believe that integration tests on CI fail because It generates thousands of files, but I haven't committed autogenerated files yet to make the review process easier. Obviously, I will have to make them pass before merging :)

Just to make sure, do you want me to make it works for both old and new protocols for a while? I may do that, but it would take me lots of additional time. The original ticket says: "This updated version of normalization will never see old-style catalogs, because it will be released after the platform protocol versioning migrations, and normalization is released as part of platform (i.e. it's impossible for someone to upgrade normalization without also upgrading platform).".

So I didn't spend time for keep supporting old-style also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test is complaining that the == will never return true, because the left side (plain_datatypes_list) is a List but the right side (data_type.STRING_TYPE) is a String

@grishick is the reasoning for supporting both versions just as a fallback in case something weird happens? I don't hate that (e.g. for manual testing without needing to recompile repeatedly)

@etsybaev what's the obstacle to supporting both? I think the minimal lift is to change the is_xyz_type() methods to look more like

def is_datetime_without_timezone(definition):
  if "oneOf" in definition:
    ...
  else if "$ref" in definition:
    ...
  else:
    # handle old-style schema

though maybe we should just refactor this into more of a get_type(definition) -> string style, and have stream_processor.py do something like

type = get_type(definition)
if type == "object"
else if type == "string"
...

That feels more correct anyway (i.e. figure out which airbyte type we're looking at, and then handle it appropriately). And then get_type can have all the complicated logic around $ref/oneOf/type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my reasoning for supporting both versions was just as a fallback in case something weird happens. However, I think that's not necessary, so I am OK with not supporting the older version. If somehow an older platform ends up loading new normalization image it'll fail, but that's not a reasonable scenario given how we manage normalization version right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'll become more of a possibility, since normalization version is moving into destination_definitions.yaml. I don't think there's any safeguard against downgrading normalization to a pre-protocol-v1 version other than "PR reviewer is careful"

though maybe there should be such a safeguard

Copy link
Contributor Author

@etsybaev etsybaev Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that there are also lots of other places where the old "type" were used, not only in cast_property_type method. To find all cases I had to do dozens run in debug mode for different connectors and check what actually comes here. Some of the methods also appear to be called in very unpredictable ways. So adding backward compatibility will take me some time (can't estimate it yet), as for now I will not know if the received "type" is an object or still should be treated as simple object. Also will need to dive deeper in tests and SQLs. As new goal will be not only make them pass, but also to extend to support versioning.
By the way, locally all tests pass, maybe something is not passed on Ci as I haven't committed autogenerated files to make the review process easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note from grooming session: we don't need to keep support for the old protocol version in normalization. The only breaking scenario is old platform + new normalization, which is not possible unless someone deliberately updates hard-coded normalization version w/o updating the platform.

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a first pass at this. Definitely a complicated PR! I think the general approach makes sense; I just had some questions about specific implementation bits.

Also had a comment for a potential refactor - totally reasonable to not want to do that since this is already a big change, but I think it would simplify the logic overall.

return property_type == "string" or "string" in property_type
if data_type.ONE_OF_VAR_NAME in property_type and data_type.ONE_OF_VAR_NAME in property_type:
plain_datatypes_list = get_plain_list_from_one_of_array(property_type)
return plain_datatypes_list == data_type.STRING_TYPE or data_type.STRING_TYPE in plain_datatypes_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test is complaining that the == will never return true, because the left side (plain_datatypes_list) is a List but the right side (data_type.STRING_TYPE) is a String

@grishick is the reasoning for supporting both versions just as a fallback in case something weird happens? I don't hate that (e.g. for manual testing without needing to recompile repeatedly)

@etsybaev what's the obstacle to supporting both? I think the minimal lift is to change the is_xyz_type() methods to look more like

def is_datetime_without_timezone(definition):
  if "oneOf" in definition:
    ...
  else if "$ref" in definition:
    ...
  else:
    # handle old-style schema

though maybe we should just refactor this into more of a get_type(definition) -> string style, and have stream_processor.py do something like

type = get_type(definition)
if type == "object"
else if type == "string"
...

That feels more correct anyway (i.e. figure out which airbyte type we're looking at, and then handle it appropriately). And then get_type can have all the complicated logic around $ref/oneOf/type

one_of_property = one_of_definition[data_type.ONE_OF_VAR_NAME]
plain_datatypes_list = list()
for prop in one_of_property:
plain_datatypes_list.append(prop[data_type.REF_TYPE_VAR_NAME])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do if the oneOf has a non-ref entry? e.g.

{
  "oneOf": [
    {"$ref": "....String"},
    {"type": "object", "properties": {...}}
  ]
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgao In this case, it will take a String type by already existing logic. If the string is in the list, then other types would be ignored. As far as I see from Normalization logic - it always takes a wider type for SQL column-> string


def is_binary_datatype(property_type) -> bool:
if data_type.ONE_OF_VAR_NAME in property_type and data_type.ONE_OF_VAR_NAME in property_type:
plain_datatypes_list = get_plain_list_from_one_of_array(property_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the data_type.ONE_OF_VAR_NAME in property_type condition got duplicated, and I think there's a way to avoid relying on the get_plain_list_from_one_of_array method entirely:

if data_type.ONE_OF_VAR_NAME in property_type:
  return any(option["$ref"] == "...BinaryData" for option in property_type["oneOf"])

(copied the syntax from https://stackoverflow.com/a/22971758)

This makes the assumption that {"oneOf": "...BinaryData"} will never happen. I think that's a valid assumption, because oneOf should always be a list of schemas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

plain_datatypes_list = get_plain_list_from_one_of_array(property_type)
return plain_datatypes_list == data_type.BINARY_DATA_TYPE or data_type.BINARY_DATA_TYPE in plain_datatypes_list
else:
return property_type == data_type.BINARY_DATA_TYPE or data_type.BINARY_DATA_TYPE in property_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand this - isn't property_type a dict (i.e. Map)? When would it ever be equal to data_type.BINARY_DATA_TYPE (String) or contain data_type.BINARY_DATA_TYPE as a key?

i.e. should this look more like return property_type["$ref"] == data_type.BINARY_DATA_TYPE ?

(all I want for Christmas is static typing 😢)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a generic verification to check for object. As depends on the place where this is called, It may be a single object, it may be a dict, or it may be inside of oneOf. But if we have an assumption that binary datatype could not be a part of oneOf (as per your comment above) - then yes, this may be simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgao This is what I mean by saying that some of the methods may be called from different places and sometimes passing a simple object and sometimes a "dict". This was also mindblowing for me after strict types in java :)

Here is an example where it's passed like just a string. That is why some checks may look redundant at the first glance

Selection_139

Copy link
Contributor

@edgao edgao Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmmmmmmmmmmmmmm

how hard would it be to make all of these methods always expect a dict? It feels like this is causing us a lot of confusion :/ and this seems like a good opportunity to clean up the code a little bit

or is that what you did in the most recent set of commits?

return property_type == data_type.STRING_TYPE or data_type.STRING_TYPE in property_type


def is_binary_datatype(property_type) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add a type hint for property_type, assuming I've understood correctly

Suggested change
def is_binary_datatype(property_type) -> bool:
def is_binary_datatype(property_type: dict) -> bool:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes this may a simple string. Sometimes dict, sometimes string :(
Selection_139

@@ -115,4 +168,9 @@ def is_simple_property(definition: dict) -> bool:


def is_combining_node(properties: dict) -> Set[str]:
return set(properties).intersection({"anyOf", "oneOf", "allOf"})
# this case appears when we have analog of old protocol like id: {type:[number, string]} and it's handled separately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically there could be cases where this isn't true: (more examples in https://github.com/airbytehq/airbyte/pull/19909/files?diff=unified&w=1#diff-6829fbf8db73883ede3e47d869e6f5756a882c2d94d74c13e0b2128b07666216R1038-R1050 )

{
  "oneOf": [
    {"$ref": ...},
    {"type": "object", "properties": {"foo": {...}}},
    {"type": "object", "properties": {"bar": {...}}}
  ]
}

but it doesn't really matter - this method doesn't actually control anything right now (it only gets called in stream_processor.py using if is_combinig_node: pass)

json_extract = jinja_call(f"json_extract('{table_alias}', {json_column_name}, {json_path}, {normalized_json_path})")
elif data_type.REF_TYPE_VAR_NAME in definition and (is_date(definition) or is_time(definition) or is_datetime(definition)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this branch need to be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were there before my changes. I just adopted it to handle new ref case

if self.destination_type.value == DestinationType.POSTGRES.value:
# sql_type = "bytea"
sql_type = jinja_call("type_binary()")
return f"cast(decode({column_name}, 'base64') as {sql_type}) as {column_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth defining a decode_base64 macro, to move the destination-specific logic out of this file. Then if we make the type_binary() macro return dbt_utils.type_string() for destinations without a binary type, then we don't need so much destination-specific logic here.

I.e. we could do

decode_call = jinja_call(f"decode_base64({column_name})")
return f"cast({decode_call} as {sql_type}) as {column_name}"

for all destination types

Copy link
Contributor Author

@etsybaev etsybaev Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about macros, but all DBs use their own approaches and nothing can be re-used. Do we still want to move it and get more code? Also, macroses sometimes stopped considering variables as variables, but like values in complicated queries. Thanks

@etsybaev
Copy link
Contributor Author

etsybaev commented Dec 9, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3660201432
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3660201432
🐛

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3660201432
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3660201432
🐛

@etsybaev
Copy link
Contributor Author

etsybaev commented Dec 10, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3664033102
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3664033102
🐛 https://gradle.com/s/3nf4jqmmlr4km

Build Failed

Test summary info:

Could not find result summary

@etsybaev
Copy link
Contributor Author

etsybaev commented Dec 10, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3665702106
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3665702106
🐛 https://gradle.com/s/duubza3lzibb6

Build Failed

Test summary info:

Could not find result summary

@etsybaev
Copy link
Contributor Author

etsybaev commented Dec 11, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3669367158
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3669367158
🐛 https://gradle.com/s/r57r4bwwltd6u

Build Failed

Test summary info:

Could not find result summary

@etsybaev
Copy link
Contributor Author

etsybaev commented Dec 12, 2022

Integration tests passed locally for command ./gradlew clean --no-daemon --scan :airbyte-integrations:bases:base-normalization:integrationTest
Selection_144

@etsybaev
Copy link
Contributor Author

etsybaev commented Dec 12, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3671475158
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3671475158
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/data_type.py                                           14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/utils.py                             72     29    60%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 628    431    31%
-------------------------------------------------------------------------------------
TOTAL                                                              1509    675    55%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       140      5    96%   87, 93, 238, 242-243
	 source_acceptance_test/conftest.py                     208     92    56%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-280, 288-301, 306-312, 319-330, 337-353
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              398    111    72%   53, 58, 87-95, 100-107, 111-112, 116-117, 299, 337-354, 363-371, 375-380, 386, 419-424, 462-469, 512-514, 517, 582-590, 602-605, 610, 666-667, 673, 676, 712-722, 735-760
	 source_acceptance_test/tests/test_incremental.py       158     14    91%   52-59, 64-77, 240
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1599    332    79%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/data_type.py                                           14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/utils.py                             72     29    60%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 628    431    31%
-------------------------------------------------------------------------------------
TOTAL                                                              1509    675    55%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/data_type.py                                           14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/utils.py                             72      2    97%
normalization/transform_catalog/destination_name_transformer.py     166      5    97%
normalization/transform_catalog/stream_processor.py                 628     40    94%
normalization/transform_catalog/catalog_processor.py                147     12    92%
normalization/transform_catalog/dbt_macro.py                         22      3    86%
normalization/transform_catalog/table_name_registry.py              174     51    71%
normalization/transform_catalog/transform.py                         61     22    64%
normalization/transform_config/transform.py                         189     77    59%
-------------------------------------------------------------------------------------
TOTAL                                                              1509    212    86%

Build Passed

Test summary info:

	 =========================== short test summary info ============================
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.ORACLE does not support incremental sync with schema change yet
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.TIDB does not support incremental sync with schema change yet
	 SKIPPED [3] integration_tests/test_ephemeral.py:102: ephemeral materialization isn't supported in ClickHouse yet
	 SKIPPED [1] integration_tests/test_ephemeral.py:59: Skipping test for column limit, because in MySQL, the max number of columns is limited by row size (8KB)
	 SKIPPED [1] integration_tests/test_normalization.py:144: DestinationType.CLICKHOUSE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.CLICKHOUSE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:147: DestinationType.MSSQL is disabled as it doesnt fully support schema change in incremental yet
	 SKIPPED [2] integration_tests/test_normalization.py:135: DestinationType.MYSQL does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:135: DestinationType.ORACLE does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.ORACLE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:144: DestinationType.SNOWFLAKE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:144: DestinationType.TIDB is disabled as it doesnt support schema change in incremental yet (column type changes)
	 �[32m================= �[32m�[1m47 passed�[0m, �[33m15 skipped�[0m�[32m in 4152.16s (1:09:12)�[0m�[32m ==================�[0m
	 =========================== short test summary info ============================
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.ORACLE does not support incremental sync with schema change yet
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.TIDB does not support incremental sync with schema change yet
	 SKIPPED [3] integration_tests/test_ephemeral.py:102: ephemeral materialization isn't supported in ClickHouse yet
	 SKIPPED [1] integration_tests/test_ephemeral.py:59: Skipping test for column limit, because in MySQL, the max number of columns is limited by row size (8KB)
	 SKIPPED [1] integration_tests/test_normalization.py:144: DestinationType.CLICKHOUSE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.CLICKHOUSE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:147: DestinationType.MSSQL is disabled as it doesnt fully support schema change in incremental yet
	 SKIPPED [2] integration_tests/test_normalization.py:135: DestinationType.MYSQL does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:135: DestinationType.ORACLE does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.ORACLE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:144: DestinationType.SNOWFLAKE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:144: DestinationType.TIDB is disabled as it doesnt support schema change in incremental yet (column type changes)
	 �[32m================= �[32m�[1m47 passed�[0m, �[33m15 skipped�[0m�[32m in 4310.13s (1:11:50)�[0m�[32m ==================�[0m

@grishick
Copy link
Contributor

LGTM, although I am confused about the discussion around property_type

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more comments around code structure - feel free to say I'm increasing the scope too much :)

Can you commit the generated files for airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/postgres? That should be a representative sample for what's actually changing.

return (
plain_datatypes_list == data_type.TIMESTAMP_WITHOUT_TIMEZONE_TYPE
or data_type.TIMESTAMP_WITHOUT_TIMEZONE_TYPE in plain_datatypes_list
if data_type.ONE_OF_VAR_NAME in definition:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should all the is_whatever_type methods have this oneof handling?

maybe worth doing something like this, to reduce the amount of copypasta (I didn't test this at all, hopefully it's clear enough. Also maybe would be more readable using lambda syntax in get_reftype_function, idk)

def is_type_included(definition: dict, is_type: Callable[[dict], bool]) -> bool:
    if data_type.ONE_OF_VAR_NAME in definition:
        return bool(
            any(
                is_type(option)
                for option in definition[data_type.ONE_OF_VAR_NAME]
            )
        )
    else:
        return is_type(definition)

def get_reftype_function(type: string) -> Callable[[dict], bool]:
    def is_reftype(definition: dict) -> bool:
        return data_type.REF_TYPE_VAR_NAME in definition and type == definition[data_type.REF_TYPE_VAR_NAME]
    return is_reftype

def is_binary_datatype(definition: dict) -> bool:
    return is_type_included(definition, get_reftype_function(data_type.BINARY_DATA_TYPE))

def is_datetime_with_timezone(definition: dict) -> bool:
    return is_type_included(definition, get_reftype_function(data_type. TIMESTAMP_WITHOUT_TIMEZONE_TYPE))

# etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen these methods are also called in other parts of code. So even a simple change may cause unexpected side effects and I've seen it many times during my work on this PR. So since we may have one not only in ref types I really afraid to touch it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code change is transparent though? I.e. it keeps the same method signature, just changes the implementation to be consistent across methods. So other than the methods that take both definition and property_type, it shouldn't matter?

mostly just want to doublecheck whether the oneOf handling is intentionally excluded from some methods.

Copy link
Contributor Author

@etsybaev etsybaev Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgao Ohh...got your point about Callable. Updated. Also updated methods to get rid of unused property_types. Thanks

@edgao
Copy link
Contributor

edgao commented Jan 6, 2023

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3857522392
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/3857522392
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/data_type.py                                           14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_catalog/utils.py                             57     14    75%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 628    431    31%
-------------------------------------------------------------------------------------
TOTAL                                                              1494    660    56%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   56-63, 68-81, 244
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/data_type.py                                           14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_catalog/utils.py                             57     14    75%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 628    431    31%
-------------------------------------------------------------------------------------
TOTAL                                                              1494    660    56%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/utils.py                             57      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/data_type.py                                           14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      5    97%
normalization/transform_catalog/stream_processor.py                 628     40    94%
normalization/transform_catalog/catalog_processor.py                147     12    92%
normalization/transform_catalog/dbt_macro.py                         22      3    86%
normalization/transform_catalog/table_name_registry.py              174     51    71%
normalization/transform_catalog/transform.py                         61     22    64%
normalization/transform_config/transform.py                         189     77    59%
-------------------------------------------------------------------------------------
TOTAL                                                              1494    210    86%

Build Passed

Test summary info:

	 =========================== short test summary info ============================
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.ORACLE does not support incremental sync with schema change yet
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.TIDB does not support incremental sync with schema change yet
	 SKIPPED [3] integration_tests/test_ephemeral.py:102: ephemeral materialization isn't supported in ClickHouse yet
	 SKIPPED [1] integration_tests/test_ephemeral.py:59: Skipping test for column limit, because in MySQL, the max number of columns is limited by row size (8KB)
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.CLICKHOUSE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.CLICKHOUSE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:148: DestinationType.MSSQL is disabled as it doesnt fully support schema change in incremental yet
	 SKIPPED [2] integration_tests/test_normalization.py:136: DestinationType.MYSQL does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.ORACLE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:136: DestinationType.ORACLE does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.SNOWFLAKE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.TIDB is disabled as it doesnt support schema change in incremental yet (column type changes)
	 �[32m================= �[32m�[1m47 passed�[0m, �[33m15 skipped�[0m�[32m in 4238.10s (1:10:38)�[0m�[32m ==================�[0m
	 =========================== short test summary info ============================
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.ORACLE does not support incremental sync with schema change yet
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.TIDB does not support incremental sync with schema change yet
	 SKIPPED [3] integration_tests/test_ephemeral.py:102: ephemeral materialization isn't supported in ClickHouse yet
	 SKIPPED [1] integration_tests/test_ephemeral.py:59: Skipping test for column limit, because in MySQL, the max number of columns is limited by row size (8KB)
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.CLICKHOUSE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.CLICKHOUSE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:148: DestinationType.MSSQL is disabled as it doesnt fully support schema change in incremental yet
	 SKIPPED [2] integration_tests/test_normalization.py:136: DestinationType.MYSQL does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.ORACLE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:136: DestinationType.ORACLE does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.SNOWFLAKE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.TIDB is disabled as it doesnt support schema change in incremental yet (column type changes)
	 �[32m================= �[32m�[1m47 passed�[0m, �[33m15 skipped�[0m�[32m in 4112.76s (1:08:32)�[0m�[32m ==================�[0m

@edgao edgao temporarily deployed to more-secrets January 6, 2023 18:32 — with GitHub Actions Inactive
@edgao edgao temporarily deployed to more-secrets January 6, 2023 18:32 — with GitHub Actions Inactive
@edgao edgao temporarily deployed to more-secrets January 6, 2023 19:37 — with GitHub Actions Inactive
@edgao edgao temporarily deployed to more-secrets January 6, 2023 19:37 — with GitHub Actions Inactive
@edgao
Copy link
Contributor

edgao commented Jan 6, 2023

kube acceptance tests are failing b/c flaky; helm acceptance tests are passing. merging this in and if the kube tests continue failing we can deal with it in the superbranch.

@edgao edgao marked this pull request as ready for review January 6, 2023 21:26
@edgao edgao merged commit c1d7736 into gosusnp/platform-use-protocol-v1-the-quick-way Jan 6, 2023
@edgao edgao deleted the etsybaev/17642-updated-normalization-to-new-datatypes branch January 6, 2023 21:26
gosusnp added a commit that referenced this pull request Jan 30, 2023
* Add Airbyte Protocol V1 support.

* Fix VersionedAirbyteStreamFactoryTest

* Remove AirbyteMessageMigrationV0 example

* Add Protocol Version constants

* 🎉Updated normalization to handle new datatypes (#19721)

* Updated normalization simple stream processing to handle new datatypes

* Updated normalization nested stream processing to handle new datatypes

* Updated normalization nested stream processing to handle new datatypes

* Updated normalization drop_scd_catalog processing to handle new datatypes

* Updated normalization ephemeral test processing to handle new datatypes

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more issues

* fixed more issues (clickhouse)

* fixed more issues

* fixed more issues

* fixed more issues

* added binary type processing for some DBs

* cleared commented code and moved some hardcodes to processing as macro

* fixed codestyle and cleared commented code

* minor refactor

* minor refactor

* minor refactor

* fixed bool cast error

* fixed dict->str cast error

* fixed is_combining_node cast py check

* removed commented code

* removed commented code

* committed autogenerated normalization_test_output files

* committed autogenerated normalization_test_output files (new files)

* refactored utils.py

* Updated utils.py to use Callable functions and get rid of property_type in is_number and is_bool functions

* committed autogenerated normalization_test_output files (new files)

* fixed typo in TIMESTAMP_WITH_TIMEZONE_TYPE

* updated stream_processor to handle string type first as a wider type

* fixed arrays normalization by updating is_simple_property method as per new approaches

* format

Co-authored-by: Edward Gao <[email protected]>

* Update airbyte protocol migration (#20745)

* Extract MigrationContainer from AirbyteMessageMigrator

* Add ConfiguredAirbyteCatalogMigrations

* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations

* Enable ConfiguredAirbyteCatalog migration

* Fix tests

* Remove extra this.

* Add missing docs

* Typo

Co-authored-by: Edward Gao <[email protected]>

* Data types update: Implement protocol message migrations (#19240)

* Extract MigrationContainer from AirbyteMessageMigrator

* Add ConfiguredAirbyteCatalogMigrations

* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations

* Enable ConfiguredAirbyteCatalog migration

* set up scaffolding

* [wip] more scaffolding, basic unit test

* minimal green code

* [wip] add failing test for other primitive types

* correct version number

* handle basic primitive type decls

* add implicit cases

* add recursive schema

* formatting

* comment

* support not

* fix indentation

* handle all nested schema cases

* handle boolean schemas

* verify empty schema handling

* cleanup

* extract map

* code organization

* extract method

* reformat

* [wip] more tests, minor fix type array handling

* corrected test

* cleanup

* reformat

* switch to v1

* add support for multityped fields

* missed test case

* nested test class

* basic record upgrade

* implement record upgrades

* slight refactor

* comments+clarificationso

* extract constants

* (partly) correct model classes

* add de/ser

* formatting

* extract constants

* fix json reference

* update docs

* switch to v1 models

* fix compile+test

* add base64 handling

* use vnull

* Data types update: Implement protocol message downgrade path (#19909)

* rough skeleton for passing catalog into migration

* basic test

* more scaffolding

* basic implementation

* add primitives test

* add in other tests (nested fields currently failing)

* add formats

* impleent oneOf handling

* formatting

* oneOf handling

* better tests

* comments + organization

* progress

* basic test case

* downgrade objects, ish

* basic array implementation

* handle numeric failure

* test for new type

* handle array items

* empty schema handling

* first pass at oneof handling

* add more tests+handling

* more tests

* comments

* add empty oneof test case

* format + reorganize

* more reorganize

* fix name

* also downgrade binary data

* only import vnull

* move migrations into v1 package

* extract schema mutation code

* comment

* extract schema migration to new class

* extract record downgrade logic for future use

* format

* fix build after rebase

* rename private method for consistency

* also implement configuredcatalog migrations >.>

* quick and dirty tests

* slight cleanup

* fix tests

* pmd

* pmd test

* null check on message objects

* maybe fix acceptance tests?

* fix name

* extract constants

* more fixes

* tmp

* meh

* fix cdc acc tests

* revert to master source-postgres

* remove log messages

* revert other misc hacks

* integers are valid cursors

* remove unrelated change

* fix build

* fix build more?

* [MUST REVERT] use dev normalization

* capture kube logs

* also here?

* no debug logs?

* delete dup from merging

* add final everywhere

* revert test changes

Co-authored-by: Jimmy Ma <[email protected]>

* On-the-fly migrations of persisted catalogs (#21757)

* On the fly catalog migration for normalization activity

* On the fly catalog migration for job persistence

* On the fly migration for standard sync persistence

* On the fly migration for airbyte catalogs

* Refactor code to share JsonSchema traversal

* Add V0 Data type search function

* PMD and Format

* Fix getOrInsertActorCatalog and ConfigRepositoryE2E tests

* Null-proofing CatalogMigrationV1Helper

* More null checks

* Fix test

* Format

* Add data type v1 support to the FE

* Changes AC test check to check exited ps (#21672)

some docker compose changes no longer show exited
processes.  this broke out test

this change should fix master

tested in a runner that failed

* Move wellknown types mapping to the utility function

* use protocolv1 normalization

---------

Co-authored-by: Topher Lubaway <[email protected]>
Co-authored-by: Edward Gao <[email protected]>

* Update protocol support range (#21996)

* bump normalization version to 0.3.0

* Add version check on normalization (#22048)

* Add normalization min version check

* Add visible for testing

---------

Co-authored-by: Edward Gao <[email protected]>
Co-authored-by: Eugene <[email protected]>
Co-authored-by: Topher Lubaway <[email protected]>
edgao added a commit that referenced this pull request Feb 2, 2023
edgao added a commit that referenced this pull request Feb 6, 2023
* Revert "Normalization: handle non-object top-level schemas; treat binary data as string (#22165)"

This reverts commit 8276d03.

* Revert "Normalization: check for ref type existence (#22161)"

This reverts commit dbe56d6.

* Revert "🎉Updated normalization to handle new datatypes (#19721)"

This reverts commit c1d7736.

* revert dest definitions

* also dockerfile

* re-add to changelog

* add comment in dockerfile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants