Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5751] add get_uri method to Connection #6426

Merged
merged 1 commit into from
Dec 11, 2019

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Oct 25, 2019

  • Add a convenience method get_uri on Connection object to generate the URI for a connection.

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Building URI for use in connection env var can be a nuisance. This is a convenience method that will do it for you, given a connection object.

So if you have created one in UI, you could do BaseHook.get_connection(conn_id).get_uri().

Or build using init params on Connection object and call get_uri().

I think it could also be nice if each hook had a get_uri method that would take all relevant params and produce a correctly encoded URI. If that were implemented, it could use this function for that purpose.

I added tests to verify that generated URIs, when parsed again, produce the same connection object. For this I used the same URIs we were already testing. And as part of this there was an efficiency in refactoring the existing from_uri test.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@dstandish dstandish force-pushed the feature/add-connection.get_uri-method branch 2 times, most recently from b815b65 to 9c6e779 Compare October 25, 2019 02:22
@codecov-io
Copy link

codecov-io commented Oct 25, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@0f21e9b). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #6426   +/-   ##
=========================================
  Coverage          ?   84.09%           
=========================================
  Files             ?      672           
  Lines             ?    38176           
  Branches          ?        0           
=========================================
  Hits              ?    32105           
  Misses            ?     6071           
  Partials          ?        0
Impacted Files Coverage Δ
airflow/models/connection.py 68.96% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f21e9b...6d458a0. Read the comment docs.


user_block = ''
if self.login is not None:
user_block += quote(self.login, safe='')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer not to build URLs myself. What do you think about using the https://docs.python.org/2/library/urlparse.html#urlparse.urlunparse function?

Copy link
Contributor Author

@dstandish dstandish Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so one reason is that urlunparse did not seem that helpful to me.

I noticed we use urlunparse in CLI connection add (to build a STDOUT message).

But even there, we construct user / password / host / port with format string (because it does not handle that for you). And it does not handle quoting for you either.

It seems more useful if you are starting with something that has been parsed with urlparse.

In any case we have to be very careful to align our quoting with the handling in parse_from_uri.

It does not handle urlencoding of the extra field either. So it did not seem of much use.

One thing I was on the fence about was whether to bother with minifying the formatting. By that i mean if user / password is omitted, should we omit the :@.

For example we could do this:

    def get_uri(self) -> str:
        uri = '{conn_type}://{login}:{password}@{host}:{port}/{schema}?{query}'.format(
            conn_type=self.conn_type.replace('_', '-'),
            login=quote(self.login or '', safe=''),
            password=quote(self.password or ''),
            host=quote(self.host or '', safe=''),
            port=self.port or '',
            schema=quote(self.schema, safe=''),
            query=urlencode(self.extra_dejson),
        )
        return uri

Much more elegant this way. However, if we do this, login will be ''. When login is omitted in URI, the login attribute it is parsed as None. As a result, 3 of the tests fail (the last 3). So, the more verbose approach seems to be a more faithful inverse of the the parse_from_uri function.

WDYT?

@dstandish dstandish force-pushed the feature/add-connection.get_uri-method branch from 9c6e779 to c3f8ef4 Compare October 25, 2019 08:57
if self.port:
host_block += ':{}'.format(self.port)

if self.schema:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think schema doesn't depend on host.

For instance AIRFLOW__CORE__SQLALCHEMY_CONN=postgresql:///airflow

Should the schema includce the leading / already? i.e. do we end up with a // here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @ashb I have updated to support (with tests) for schema only, login only, password only, and port only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lastly (one can hope) i have added another set of tests verifying that when we create connection from init attributes (rather than uri) that everything still works consistently -- URIs generated still parse to an equivalent connection object.

@dstandish dstandish force-pushed the feature/add-connection.get_uri-method branch 3 times, most recently from 4840adb to 07de406 Compare October 25, 2019 22:06
@mik-laj
Copy link
Member

mik-laj commented Oct 27, 2019

I would like to point out that some hooks have methods for creating URI.
For example:
https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py#L69-L78

@dstandish
Copy link
Contributor Author

dstandish commented Oct 28, 2019

Interesting. Yeah I think it's good for hooks to have this. I'd like it if every hook had a static method that took all valid params and produce correct URI. With a conn.get_uri() method, this would be more convenient -- hook get_uri methods could just create the conn object and call get_uri.

E.g. for GCP, such a method could look like this:

    @staticmethod
    def get_uri(
        project=None,
        key_path=None,
        keyfile_dict=None,
        scope=None,
        num_retries=None,
    ):
        extra_dict = dict(
            project=project,
            key_path=key_path,
            keyfile_dict=keyfile_dict,
            scope=scope,
            num_retries=num_retries,
        )
        extra = json.dumps(
            {"extra__google_cloud_platform__" + k: v for k, v in extra_dict.items() if v is not None}
        )
        conn = Connection(conn_type='google-cloud-platform', extra=extra)
        return conn.get_uri()

Stepping back, do you think this PR is a worthwhile addition (i.e. adding get_uri to Connection)?

It's a small thing, but I think it's good to have a standardized method of generating a uri that is guaranteed to parse correctly.

So a user could build their connection object with init params, and then generate a URI that will work. No fussing with urllib. No digging through to see how things are parsed.

@dstandish
Copy link
Contributor Author

@mik-laj @ashb any other concerns with this one?

@dstandish
Copy link
Contributor Author

@ashb @mik-laj friendly nag

this can help people produce URI reliably

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstandish Could you rebase this on to latest master please? Sorry for leaving this.

If it still passes ther under pytest (it almost certainly should) then we can merge

* Add a convenience method `get_uri` on `Connection` object to generate the URI for a connection.
@dstandish dstandish force-pushed the feature/add-connection.get_uri-method branch from 07de406 to 6d458a0 Compare December 10, 2019 22:01
@dstandish
Copy link
Contributor Author

rebased; tests pass; thanks :)

@ashb ashb merged commit 53422a8 into apache:master Dec 11, 2019
@dstandish dstandish deleted the feature/add-connection.get_uri-method branch December 14, 2019 05:36
kaxil pushed a commit that referenced this pull request Dec 18, 2019
ashb pushed a commit that referenced this pull request Dec 18, 2019
ashb pushed a commit that referenced this pull request Dec 19, 2019
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
Add a convenience method `get_uri` on `Connection` object to generate the URI for a connection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants