Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add druid ingestion connection test #33796

Closed

Conversation

jaegwonseo
Copy link
Contributor

@jaegwonseo jaegwonseo commented Aug 27, 2023

expose druid connection and implement test

closes: #33795

@potiuk
Copy link
Member

potiuk commented Aug 27, 2023

I think better approach woudl be to keep one of the connections (DBApi ?) as "druid". It will at least make existing DBApi connections work.

@potiuk
Copy link
Member

potiuk commented Aug 27, 2023

Also - this one woll need a changelog entry describing what changed and how users should modify their connections

@jaegwonseo
Copy link
Contributor Author

I think better approach woudl be to keep one of the connections (DBApi ?) as "druid". It will at least make existing DBApi connections work.

Also - this one woll need a changelog entry describing what changed and how users should modify their connections

@potiuk
In order to execute queries within a distributed system, Druid utilizes broker nodes, and for data ingestion, it leverages ingest nodes. Since each node has distinct roles and invocation addresses, maintaining the current setup with these two separate nodes would be preferable.
and I do not know the exact criteria for releasing new versions in the airflow provider, so I assumed 3.6.0 for now and added the change log.

@potiuk
Copy link
Member

potiuk commented Aug 28, 2023

In order to execute queries within a distributed system, Druid utilizes broker nodes, and for data ingestion, it leverages ingest nodes. Since each node has distinct roles and invocation addresses, maintaining the current setup with these two separate nodes would be preferable.

I see. Ok then it should be a breaking change - but we let release manager decide on that - entry in changeloge should not contain the version (see suggestion).

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

We need also to have connections.rst in druid provider.
See example in other provider:
https://github.com/apache/airflow/blob/main/docs/apache-airflow-providers-apache-hdfs/connections.rst

Related to #28790

airflow/providers/apache/druid/CHANGELOG.rst Show resolved Hide resolved
airflow/providers/apache/druid/CHANGELOG.rst Show resolved Hide resolved
Comment on lines +38 to +39
.. note::
The connection type of Druid has been separated into two(druid_broker, druid_ingest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate on this one. We should explain to users how to mitigate the issue (leave instructions how to migrate to this new version)

Copy link
Contributor Author

@jaegwonseo jaegwonseo Sep 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal
I'm not an English speaker, so I'm not sure if the following mention is accurate. Please review this

The connection type of Druid has been separated into two(Druid Broker, Druid Ingest). Please perform one of the two methods listed below.
  1. update druid connect type to (Druid Broker, Druid Ingest) on UI screen (Admin/Connections)
  2. run command airflow db migrate

but in second case, airflow db migrate doesn't support update connect type so we need implement this.
my suggestion is write first option only, and add an implementation to check the version of the Druid provider in the db migrate process. how about do this in other issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we don't have connection migrations, and with current release model (that doesn't mean it is bad) I don't think it even possible to implement it. All migration happen only on upgrade Airflow Core, not Providers.

Just some examples of the past breaking changes and their descriptions:

  1. Remove AWS S3 Connection (spoiler alert, it does impact some users)
    https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/changelog.html#id66

  2. Removed deprecated fields in Slack Provider (still wait when users starts complain)
    https://airflow.apache.org/docs/apache-airflow-providers-slack/stable/changelog.html#breaking-changes

  3. Deprecated parameter removed from Google Provider
    https://airflow.apache.org/docs/apache-airflow-providers-google/stable/changelog.html#breaking-changes

The main point of this breaking changes. When user open issue/discussion in Github or write in slack it much easier to send them link to Breaking Changes, rather then try to figure out how to solve their issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal @Taragolis
Taragolis thanks for adding example link
in this pr the only concern point is changing connection type. So, I think this mention is enough.

In this version of provider Apache Druid Connection(conn_type="druid") has been separated into two(Druid Broker, Druid Ingest)
Update connect type(conn_type = "druid") to druid broker, druid ingest 

Copy link
Contributor

@eladkal eladkal Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the note is to explain what actions needed to be done in order to migrate.
The note you listed says the conn is split in two so:

  1. How do I choose which one? (Or/And what is the motivation to the split)
  2. If I want to have my code work exactly as it is before what steps I need to do?

These are the two questions the notes must answer

@eladkal
Copy link
Contributor

eladkal commented Sep 1, 2023

It's almost ready :)

host = conn.host
port = conn.port
# ref : https://druid.apache.org/docs/latest/operations/api-reference/#tasks
response = requests.get(f"http://{host}:{port}/druid/indexer/v1/tasks")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main reasons for connection and test_connection method is storing the credentials securely and test them via this method, so IMHO this method should test if the credentials are valid or not. You can try something like:

Suggested change
response = requests.get(f"http://{host}:{port}/druid/indexer/v1/tasks")
response = requests.get(
f"http://{host}:{port}/druid/indexer/v1/tasks",
auth=self.get_auth(),
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hussein-awala
The Druid Broker Hook overrides the DbApiHook, but the test connection is not implemented to use authentication information
https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.py#L550

So, I think adding authentication information to the Druid hook (broker, ingest) is handled in another issue

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added stale Stale PRs per the .github/workflows/stale.yml policy file and removed stale Stale PRs per the .github/workflows/stale.yml policy file labels Oct 19, 2023
Copy link

github-actions bot commented Dec 7, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 7, 2023
@github-actions github-actions bot closed this Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:apache-druid stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the airflow druid_ingest_default connection test always fails.
5 participants