Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Normalization: Fix sync from HubSpot to MySQL fails with "Row size too large" on create table #10485

Merged
merged 3 commits into from
Feb 22, 2022

Conversation

htrueman
Copy link
Contributor

@htrueman htrueman commented Feb 20, 2022

What

Closes #7994.
Changes default string casting from varchar(512) to text.

How

  • Found the issue is indeed possible to solve using the text fields.
  • Before we casted string fields as cast(field as char) which leaded to field varchar(512).
  • varchar(512) may use as much as 512 * 4 = 2048 bytes in utf8mb4 encoding.
  • This means that it enough to have 32 sting fields to exceed InnoDB create table 65,535 bytes row size quota (for instance, hubspot marketing_emails has 38 of those fields).

So as a fix, I've updated the mysql create table query to cast sting fields as cast(field as char(1024)).
Which leads to field text type on created table (cast(field as text) is not a valid statement, see https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast).

Experimentally found that values that is less then 1024 are converted to varchar types and values larger may lead to mediumtext or longtext, which are too large.

Recommended reading order

  1. stream_processor.py
  2. the rest

🚨 User Impact 🚨

This would change varchar field in mysql tables to text.

Bump docker version.
Update basic-normalization.md docs.
@github-actions github-actions bot added area/documentation Improvements or additions to documentation area/platform issues related to the platform area/worker Related to worker normalization labels Feb 20, 2022
@htrueman htrueman temporarily deployed to more-secrets February 20, 2022 20:04 Inactive
@htrueman htrueman temporarily deployed to more-secrets February 20, 2022 20:04 Inactive
@htrueman
Copy link
Contributor Author

@sergei-solonitcyn tested backward comparability. Everything works fine.

@htrueman htrueman temporarily deployed to more-secrets February 21, 2022 20:16 Inactive
@htrueman htrueman temporarily deployed to more-secrets February 21, 2022 20:16 Inactive
@htrueman
Copy link
Contributor Author

Also it's possible to get another issue:
Row size too large (> 8126). Changing some columns to TEXT or BLOB may help. In current row format, BLOB prefix of 0 bytes is stored inline..
But it doesn't seem to be a real issue and may be fixed by changing the innodb_page_size value in mysql config.

@htrueman
Copy link
Contributor Author

htrueman commented Feb 22, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1880833859
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1880833859
Python tests coverage:

Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
base_python/__init__.py                                                                                                            13      0   100%
base_python/catalog_helpers.py                                                                                                     10      6    40%
base_python/cdk/__init__.py                                                                                                         0      0   100%
base_python/cdk/abstract_source.py                                                                                                 89     64    28%
base_python/cdk/streams/__init__.py                                                                                                 0      0   100%
base_python/cdk/streams/auth/__init__.py                                                                                            0      0   100%
base_python/cdk/streams/auth/core.py                                                                                                8      1    88%
base_python/cdk/streams/auth/oauth.py                                                                                              37     26    30%
base_python/cdk/streams/auth/token.py                                                                                               9      4    56%
base_python/cdk/streams/core.py                                                                                                    63     32    49%
base_python/cdk/streams/exceptions.py                                                                                              10      2    80%
base_python/cdk/streams/http.py                                                                                                    67     33    51%
base_python/cdk/streams/rate_limiting.py                                                                                           30     14    53%
base_python/cdk/utils/__init__.py                                                                                                   0      0   100%
base_python/cdk/utils/casing.py                                                                                                     4      0   100%
base_python/cdk/utils/event_timing.py                                                                                              47      3    94%
base_python/client.py                                                                                                              56     33    41%
base_python/entrypoint.py                                                                                                          70     56    20%
base_python/integration.py                                                                                                         52     25    52%
base_python/logger.py                                                                                                              33     15    55%
base_python/schema_helpers.py                                                                                                      56     41    27%
base_python/source.py                                                                                                              51     34    33%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                             832    389    53%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
normalization/__init__.py                                                                                                           4      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/transform_catalog/catalog_processor.py                                                                              143     77    46%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/stream_processor.py                                                                               524    337    36%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/transform.py                                                                                       45     26    42%
normalization/transform_catalog/utils.py                                                                                           33      7    79%
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_config/transform.py                                                                                       150     36    76%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1385    525    62%
Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/config.py                        74      6    92%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/tests/test_core.py              275    106    61%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/tests/test_incremental.py        69     38    45%
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/utils/common.py                  70     17    76%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/utils/connector_runner.py       110     48    56%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
------------------------------------------------------------------------
TOTAL                                                  876    259    70%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124     36    71%
normalization/__init__.py                                                                                                           4      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/transform_catalog/catalog_processor.py                                                                              143     77    46%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/stream_processor.py                                                                               524    337    36%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/transform.py                                                                                       45     26    42%
normalization/transform_catalog/utils.py                                                                                           33      7    79%
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_config/transform.py                                                                                       150     36    76%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1385    561    59%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
normalization/__init__.py                                                                                                           4      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/transform_catalog/catalog_processor.py                                                                              143     12    92%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      5    97%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/stream_processor.py                                                                               524     39    93%
normalization/transform_catalog/table_name_registry.py                                                                            174     51    71%
normalization/transform_catalog/transform.py                                                                                       45     30    33%
normalization/transform_catalog/utils.py                                                                                           33      0   100%
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_config/transform.py                                                                                       150     46    69%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1385    183    87%

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 22, 2022 09:58 Inactive
@htrueman htrueman changed the title Fix sync from HubSpot to MySQL fails with "Row size too large" on create table 🐛 Normalization: Fix sync from HubSpot to MySQL fails with "Row size too large" on create table Feb 22, 2022
@htrueman
Copy link
Contributor Author

htrueman commented Feb 22, 2022

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1881177716
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1881177716

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 22, 2022 11:11 Inactive
@htrueman htrueman merged commit 5464b1c into master Feb 22, 2022
@htrueman htrueman deleted the htrueman/hubspot-mysql-sync-fix branch February 22, 2022 12:22
@ChristopheDuong ChristopheDuong mentioned this pull request Mar 3, 2022
@ChristopheDuong
Copy link
Contributor

Since the generated sql files from normalization-mysql is being changed in this PR, the integration tests outputs should also be included in the PR: they will reflect what the actual change in terms of final native SQL queries would be.

See #10837

Comment on lines +543 to +545
elif self.destination_type == DestinationType.MYSQL:
# Cast to `text` datatype. See https://github.com/airbytehq/airbyte/issues/7994
sql_type = f"{sql_type}(1024)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper way of changing a datatype for a certain destination should be done in dbt macros, not directly in the python code:

see

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be causing this error?

Failure Origin: normalization, Message: Normalization failed during the dbt run. This may indicate a problem with the data itself.

1292 (22007): Truncated incorrect CHAR(1024) value:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation area/platform issues related to the platform area/worker Related to worker normalization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sync from HubSpot to MySQL fails - needs a LOB
5 participants