Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly handle unbound varchar case instead of defaulting to varchar(256) #194

Merged
merged 2 commits into from
Feb 10, 2023

Conversation

kalvinnchau
Copy link
Contributor

@kalvinnchau kalvinnchau commented Dec 9, 2022

Overview

This is related to #35, the linked issue listed two separate issues, and one was already resolved.

one is with dbt core: the varying(256) is hardcoded in dbt core and doesn't use the overriden TrinoColumn class methods. see [CT-78] [Bug] Adapters should be able to override string and numeric type of columns dbt-labs/dbt-core#4603 (PR submitted)

Now that the dbt-core PR has been merged, we can use the TrinoColumn class to handle the hard-coded 256, the super class implementation still checks if char_size is None and sets it to 256 if is it `None.

https://github.com/dbt-labs/dbt-core/blob/0544b085439b3a635b8ce56adbf56d8e7c7e6839/core/dbt/adapters/base/column.py#L86-L94

In dbt_utils.union_relations it uses the col.data_type property to get the type
https://github.com/dbt-labs/dbt-utils/blob/84cf0f2971f5bf539b06c42373a5e80b59ba7047/macros/sql/union.sql#L102-L111

Given two tables:

CREATE TABLE test.test_union_1 (
    str VARCHAR,
    str_10 VARCHAR(10)
);

CREATE TABLE test.test_union_2 (
    str VARCHAR, 
    str_10 VARCHAR(10),
    str_20 VARCHAR(20)
);

The dbt_utils.union_relations macro currently produces:

{% set relations = trino__get_relations_by_pattern('test', 'test_union%') %}

{{ dbt_utils.union_relations(relations = filtered_relations) }}
-->
(
  SELECT
    CAST('default.test.test_union_1' AS VARCHAR)
      AS _dbt_source_relation,
    CAST(str AS VARCHAR(256)) AS str,        # we didn't set any max
    CAST(str_10 AS VARCHAR(10)) AS str_10,
    CAST(NULL AS VARCHAR(20)) AS str_20
  FROM
    default.test.test_union_1
)
UNION ALL
(
  SELECT
    CAST('default.test.test_union_2' AS VARCHAR)
      AS _dbt_source_relation,
    CAST(str AS VARCHAR(256)) AS str,
    CAST(str_10 AS VARCHAR(10)) AS str_10,
    CAST(str_20 AS VARCHAR(20)) AS str_20
  FROM
    default.test.test_union_2
);

This change allows the dbt_utils.union_relations to produce the expected output of:

(
  SELECT
    CAST('default.test.test_union_1' AS VARCHAR)
      AS _dbt_source_relation,
    CAST(str AS VARCHAR) AS str,
    CAST(str_10 AS VARCHAR(10)) AS str_10,
    CAST(NULL AS VARCHAR(20)) AS str_20
  FROM
    default.test.test_union_1
)
UNION ALL
(
  SELECT
    CAST('default.test.test_union_2' AS VARCHAR)
      AS _dbt_source_relation,
    CAST(str AS VARCHAR) AS str,
    CAST(str_10 AS VARCHAR(10)) AS str_10,
    CAST(str_20 AS VARCHAR(20)) AS str_20
  FROM
    default.test.test_union_2
);

Let me know if there's some place to easily add in tests for this sort of case and I can do that.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • README.md updated and added information about my change
  • I have run changie new to create a changelog entry

Copy link
Contributor

@hovaesco hovaesco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR. It looks good to me.

Let me know if there's some place to easily add in tests for this sort of case and I can do that.

@mdesmet @damian3031 do we test union_relations macro?

@mdesmet
Copy link
Member

mdesmet commented Dec 15, 2022

Reading the comment here makes me think there is another bug, it could also be that other dbms always return a bounded varchar.

# char_size should never be None. Handle it reasonably just in case
https://github.com/dbt-labs/dbt-core/blob/0544b085439b3a635b8ce56adbf56d8e7c7e6839/core/dbt/adapters/base/column.py#L91

@mdesmet
Copy link
Member

mdesmet commented Dec 15, 2022

the integration tests for union_relations are here:

https://github.com/dbt-labs/dbt-utils/blob/f4c4ae6ef1e70d04a2f96bbc0fa7b5de7539d4c9/integration_tests/models/sql/schema.yml#L151-L165

We don't test this particular case however where all columns are unbounded and then converted into VARCHAR(256).

@mdesmet
Copy link
Member

mdesmet commented Dec 15, 2022

In case of a UNION with unbounded VARCHAR and bounded VARCHAR I think following code will also error as we can't compare None with int

https://github.com/dbt-labs/dbt-core/blob/0544b085439b3a635b8ce56adbf56d8e7c7e6839/core/dbt/adapters/base/column.py#L96-L102

@kalvinnchau
Copy link
Contributor Author

@mdesmet

Reading the comment here makes me think there is another bug, it could also be that other dbms always return a bounded varchar.

# char_size should never be None. Handle it reasonably just in case https://github.com/dbt-labs/dbt-core/blob/0544b085439b3a635b8ce56adbf56d8e7c7e6839/core/dbt/adapters/base/column.py#L91

In case of a UNION with unbounded VARCHAR and bounded VARCHAR I think following code will also error as we can't compare None with int

https://github.com/dbt-labs/dbt-core/blob/0544b085439b3a635b8ce56adbf56d8e7c7e6839/core/dbt/adapters/base/column.py#L96-L102

Since we haven't overidden Column.string_size, looks like any unbound VARCHAR will be treated as VARCHAR(256), and in the can_expand_to it would downcast when the comparing a unbound VARCHAR and any VARCHAR(X) where X > 256 and select the VARCHAR(X) even when the other side is unbound.

We could override the Column.string_size method to return the MAX_LENGTH of a VARCHAR in Trino when given an unbound VARCHAR, defined as 2147483646 in https://github.com/trinodb/trino/blob/master/core/trino-spi/src/main/java/io/trino/spi/type/VarcharType.java#L48 (Though I think the max length varchar depends on the backend?)

We can override just the Column.can_expand_to to handle some of this, but the comment in column.py indicates we should set a valid char_size and I'm not sure where else this might be used, so I'm not liking this option so much.

Let me know your thoughts on setting an unbound VARCHAR's char_size to the MAX_LENGTH variable.


the integration tests for union_relations are here:

https://github.com/dbt-labs/dbt-utils/blob/f4c4ae6ef1e70d04a2f96bbc0fa7b5de7539d4c9/integration_tests/models/sql/schema.yml#L151-L165

We don't test this particular case however where all columns are unbounded and then converted into VARCHAR(256).

This repo doesn't require dbt-utils at the moment, is that something you'd want to add to include those tests for this?

@mdesmet
Copy link
Member

mdesmet commented Dec 23, 2022

@kalvinnchau: I checked the other connectors and seems it is safe to set it to the max VARCHAR length of Trino. Let's do that.

Let's not add dbt-utils to dbt-trino. Maybe we can simply add some integration tests where we assert that an unbounded VARCHAR returns the max VARCHAR length?

set the string_size of an unbound varchar to max length of a varchar
add some unit tests to cover the bound and unbound varchar parsing
@kalvinnchau
Copy link
Contributor Author

@mdesmet I've overridden the string_size() method that returns the max VARCHAR length in the case where None is defined, and defaults to the normal super class method otherwise.

Also added a TestTrinoColumn that tests against the bound and unbound varchars, asserting that in the unbound case, the char_size is None, but string_size() == TRINO_VARCHAR_MAX_LENGTH

@mdesmet mdesmet merged commit a7908e4 into starburstdata:master Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants