Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make hive macros return string type vs bytes #8598

Merged
merged 1 commit into from
Jun 16, 2020
Merged

Conversation

Acehaidrey
Copy link
Contributor

With the current implementation of the hive macros encoding the resultant from the metastore calls, in py2 this returns a string type still but in python3 encoding forces the representation to be a byte type. See the example below

ahaidrey-078HTD6:incubator-airflow ahaidrey$ python3
Python 3.7.0 (default, Oct  2 2018, 09:20:07)
[Clang 10.0.0 (clang-1000.11.45.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 'apple'
>>> b = a.encode('utf-8')
>>> type(b)
<class 'bytes'>

ahaidrey-078HTD6:~ ahaidrey$ python
Python 2.7.16 (default, Dec  3 2019, 02:03:47)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 'apple'
>>> b = a.encode('utf-8')
>>> type(b)
<type 'str'>

The issue with this is that the resultant for example being used by macros returns a byte type that isn't templatable as a string and breaks the queries it is used in. What this means is that all the templates need to be written as something like this:

val = "{{ macros.hive.max_partition(table='mytable', schema='myschema', field='myfield', filter_map={'key1': 'val1'}).decode('utf-8') }}"

Requiring from the users end to always decode the value is not the intention of this method and should use a value that can be returned as is.

This PR is to fix this ordeal. We may be able to just remove the encoding altogether but it could make things backwards incompatible.


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists (not existing)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@jhtimmins
Copy link
Contributor

It looks like (at least some of) the tests are failing because other tests that call . _get_max_partition_from_part_specs() expect to get a byte value back, whereas now it's returning a string. See line 296 of test_hive.py.

@Acehaidrey
Copy link
Contributor Author

Acehaidrey commented Apr 28, 2020

The reason is that the consumers of this return type should not be getting a byte back but a real string response. Wanted to make this clear

@jhtimmins
Copy link
Contributor

@Acehaidrey Not sure I have the context to say.

Could you explain what you mean by "incompatible return"?

@Acehaidrey
Copy link
Contributor Author

Hi @jhtimmins ,

Sorry for the delayed response. I cleaned my message - realized it didn't make sense. There is not any incompatible return. But I fixed the tests so please take a look. This method is intended to be used with passing string values to scripts in template-able sql scripts and returning a string type is what is expected.

@Acehaidrey Acehaidrey changed the title make hive macros py3 compatible with decoded string return type Make hive macros py3 compatible with decoded string return type May 4, 2020
@Acehaidrey
Copy link
Contributor Author

@jhtimmins if you have a chance to revisit this

@Acehaidrey
Copy link
Contributor Author

@jhtimmins sorry for pestering but would love to get these in and close out

@Acehaidrey
Copy link
Contributor Author

@ashb mind taking a look at this one too? sorry for all the tags

@Acehaidrey Acehaidrey changed the title Make hive macros py3 compatible with decoded string return type make hive macros return string type vs bytes May 22, 2020
@Acehaidrey
Copy link
Contributor Author

@ashb mind taking a look now? did the updates

Copy link
Contributor Author

@Acehaidrey Acehaidrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbaszek mind taking a look at this too?

@turbaszek turbaszek requested a review from ashb May 31, 2020 13:58
Summary: make hive macros py3 compatible with decoded string

Reviewers: #big-data-platform

Tags: #big-data-platform

Differential Revision: https://phabricator.pinadmin.com/D548643
@Acehaidrey
Copy link
Contributor Author

@turbaszek I just rebased instead of git pull etc. @ashb mind please taking a look one last time to close this out once and for all?

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay for master, any idea what the change should look like for 1.10 where we still have to support py2+3

@Acehaidrey
Copy link
Contributor Author

Hey @ashb for 1.10 it actually can remain the same. So from a quick test I ran locally even if py2 it returns the value as a string type. Not sure why they encoded but maybe older versions of py2 had some caveat?

Also thanks for reviewing!

@Acehaidrey
Copy link
Contributor Author

hey @ashb mind taking a look at the latest comment.

@ashb
Copy link
Member

ashb commented Jun 5, 2020

Hey @ashb for 1.10 it actually can remain the same. So from a quick test I ran locally even if py2 it returns the value as a string type. Not sure why they encoded but maybe older versions of py2 had some caveat?

Also thanks for reviewing!

What about py3 on 1.10.x?

@Acehaidrey
Copy link
Contributor Author

This change is actually done using branch v1.10-stable. So it works as is here @ashb . Good question. So no concern there either

@Acehaidrey
Copy link
Contributor Author

@ashb wanted too knoow if there were any more concerns or if we merge this

@Acehaidrey
Copy link
Contributor Author

@ashb sorry to keep pinging -

@Acehaidrey
Copy link
Contributor Author

@ashb @turbaszek any chance we can get this in

@ashb
Copy link
Member

ashb commented Jun 15, 2020

I'll look first thing tomorrow morning. It should be good!

(Sorry, we've had some issues with our ci that need attention)

@ashb ashb merged commit c78e2a5 into apache:master Jun 16, 2020
@ashb ashb added this to the Airflow 1.10.11 milestone Jun 16, 2020
@Acehaidrey
Copy link
Contributor Author

thank team!

kaxil pushed a commit that referenced this pull request Jun 22, 2020
Co-authored-by: Ace Haidrey <[email protected]>

(cherry-picked from c78e2a5)
kaxil pushed a commit to kaxil/airflow that referenced this pull request Jun 27, 2020
potiuk pushed a commit that referenced this pull request Jun 29, 2020
Co-authored-by: Ace Haidrey <[email protected]>

(cherry-picked from c78e2a5)
kaxil pushed a commit that referenced this pull request Jul 1, 2020
Co-authored-by: Ace Haidrey <[email protected]>

(cherry-picked from c78e2a5)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Co-authored-by: Ace Haidrey <[email protected]>

(cherry-picked from c78e2a5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants