Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6786] Add KafkaConsumerHook, KafkaProduerHook and KafkaSensor #7407

Closed
wants to merge 162 commits into from
Closed

[AIRFLOW-6786] Add KafkaConsumerHook, KafkaProduerHook and KafkaSensor #7407

wants to merge 162 commits into from

Conversation

dferguson992
Copy link

@dferguson992 dferguson992 commented Feb 12, 2020

Dear Airflow Maintainers,

Please accept the following PR that

Add the KafkaProducerHook.
Add the KafkaConsumerHook.
Add the KafkaSensor which listens to messages with a specific topic.
Related Issue:
#1311

Issue link: AIRFLOW-6786

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.
  • For document-only changes commit message can start with [AIRFLOW-XXXX].

Reminder to contributors:

You must add an Apache License header to all new files
Please squash your commits when possible and follow the 7 rules of good Git commits
I am new to the community, I am not sure the files are at the right place or missing anything.

The sensor could be used as the first node of a dag where the second node can be a TriggerDagRunOperator. The messages are polled in a batch and the dag runs are dynamically generated.

Thanks!

Note, as per denied PR #1415, it is important to mention these integrations are not suitable for low-latency/high-throughput/streaming. For reference, #1415 (comment).

Co-authored-by: Dan Ferguson [email protected]
Co-authored-by: YuanfΞi Zhu

@boring-cyborg
Copy link

boring-cyborg bot commented Feb 12, 2020

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://apache-airflow-slack.herokuapp.com/

@ashb
Copy link
Member

ashb commented Feb 12, 2020

@dferguson992 This PR pre-dates the projects use of Jira so you'll need to sign up for an account at issues.apache.org/jira/ and create one, please.

@dferguson992 dferguson992 changed the title Add KafkaConsumerHook, KafkaProduerHook and KafkaSensor [AIRFLOW-6786] Add KafkaConsumerHook, KafkaProduerHook and KafkaSensor Feb 12, 2020
@ashb
Copy link
Member

ashb commented Feb 12, 2020

@dferguson992 Good work!. I'll take a look in detail tomorrow. (there will likely be a few rounds of review as this code was quite old)

@dferguson992
Copy link
Author

@ashb thanks! i'll be on vacation, back on Tuesday of next week. Happy to make any adjustments or additions to the code-base once I return. The code is quite old you're right, but it should be pretty forward compatible with vanilla Kafka.

I'm not sure if this project shies away from branding, but it may be worthwhile to consider adding support for Confluent Kafka specifically.

@ashb
Copy link
Member

ashb commented Feb 12, 2020

The first thing I notice is that we now need to rename this:

  • airflow/contrib/hooks/kafka_consumer_hook.py

->

  • airflow/providers/apache/kafka/hooks/kafka_consumer.py

etc. A bit more detail is in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths, shout if that's not clear.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the "Co-authored-by" in the messages needs email address (we can use github ones) to make it attribute correclty.

airflow/contrib/hooks/kafka_consumer_hook.py Outdated Show resolved Hide resolved
airflow/contrib/hooks/kafka_consumer_hook.py Outdated Show resolved Hide resolved
airflow/contrib/hooks/kafka_producer_hook.py Outdated Show resolved Hide resolved
airflow/contrib/hooks/kafka_producer_hook.py Outdated Show resolved Hide resolved
airflow/contrib/hooks/kafka_producer_hook.py Outdated Show resolved Hide resolved
airflow/contrib/sensors/kafka_sensor.py Outdated Show resolved Hide resolved
airflow/contrib/sensors/kafka_sensor.py Outdated Show resolved Hide resolved
@dferguson992
Copy link
Author

@ashb I've made the recommended changes except for the github email address. How does one acquire someone else's GH email?

@ashb
Copy link
Member

ashb commented Feb 21, 2020

@dferguson992 I've force pushed this branch to update the format/add the emails.

I couldn't see where you got "Hanan Shteingart" from as an author- which PR was that?

@dferguson992
Copy link
Author

hey @ashb I'm not sure what else to do for this PR. Is there anything outstanding before it can be merged? I fixed as many of the travis CI issues as I could not sure what else to do for this

@ashb
Copy link
Member

ashb commented Mar 13, 2020

@dferguson992 I've force pushed this branch to update the format/add the emails.

I couldn't see where you got "Hanan Shteingart" from as an author- which PR was that?

@dferguson992 Can you answer this Q?

@dferguson992
Copy link
Author

@ashb I got that co-author by mistake. on the original PR, that user was the last to comment on it. but i mistakenly took that user as the PR author, not an interested third-party as I should have. I've removed that user from the PR notes and the corresponding issue.

@dferguson992
Copy link
Author

Hey @ashb hope all is well. I'm not sure what to do to get the Travis CI checks to pass, but the code is functional and includes your suggestions. Can we move forward with the PR?

@dmmakita
Copy link

Hi, @dferguson992, I'm having similar needs related to airflow-kafka. May I try to help you with this issue #7407?

@dferguson992
Copy link
Author

Hi, @dferguson992, I'm having similar needs related to airflow-kafka. May I try to help you with this issue #7407?

That would be awesome! But I'm not sure what else needs doing. If you know more on this, please do go for it.

@dmmakita
Copy link

Thank you for your response. I will try to do something.

@gumartinm
Copy link

Looking forward to seeing this change in Airflow.

@kaxil
Copy link
Member

kaxil commented Apr 28, 2020

Can you please rebase to the latest Master?

@dferguson992
Copy link
Author

@ashb can you please review? I've made the necessary changes to this PR and there seems to be growing demand for it. At this time I see there needs to a review by you and some modifications to the travis ci build which is failing for an unknown reason.

@kaxil
Copy link
Member

kaxil commented Apr 28, 2020

@ashb can you please review? I've made the necessary changes to this PR and there seems to be growing demand for it. At this time I see there needs to a review by you and some modifications to the travis ci build which is failing for an unknown reason.

Can you rebase the PR please to the latest master, we have moved to Github Actions for CI for most of our tests.

@dferguson992
Copy link
Author

@ashb can you please review? I've made the necessary changes to this PR and there seems to be growing demand for it. At this time I see there needs to a review by you and some modifications to the travis ci build which is failing for an unknown reason.

Can you rebase the PR please to the latest master, we have moved to Github Actions for CI for most of our tests.

I don't have that as an option. It says merging is blocked. Should I create a new PR?

@kaxil
Copy link
Member

kaxil commented Apr 28, 2020

@ashb can you please review? I've made the necessary changes to this PR and there seems to be growing demand for it. At this time I see there needs to a review by you and some modifications to the travis ci build which is failing for an unknown reason.

Can you rebase the PR please to the latest master, we have moved to Github Actions for CI for most of our tests.

I don't have that as an option. It says merging is blocked. Should I create a new PR?

git fetch upstream master
git rebase upstream/master

and force push your branch

@dferguson992
Copy link
Author

@ashb can you please review? I've made the necessary changes to this PR and there seems to be growing demand for it. At this time I see there needs to a review by you and some modifications to the travis ci build which is failing for an unknown reason.

Can you rebase the PR please to the latest master, we have moved to Github Actions for CI for most of our tests.

I don't have that as an option. It says merging is blocked. Should I create a new PR?

git fetch upstream master
git rebase upstream/master

and force push your branch

Okay so I've done that and the pre-checks have all gone. The only issue now is there's an old request for changes from ashb from Feb12th in a stuck state, and I can't seem to remove ashb as a reviewer from this PR

@dferguson992
Copy link
Author

Okay, my latest error in this build process is this:

/opt/airflow/docs/_api/airflow/providers/apache/kafka/hooks/kafka_consumer_hook/index.rst:37: WARNING: Unexpected indentation.
/opt/airflow/docs/_api/airflow/providers/apache/kafka/sensors/kafka_sensor/index.rst:35: WARNING: Field list ends without a blank line; unexpected unindent.
looking for now-outdated files... none found
pickling environment... done
checking consistency... /opt/airflow/docs/_api/airflow/providers/apache/kafka/hooks/index.rst: WARNING: document isn't included in any toctree
/opt/airflow/docs/_api/airflow/providers/apache/kafka/sensors/index.rst: WARNING: document isn't included in any toctree
done

Does anyone know how to resolve an issue like this? @ashb? @kaxil?

@serkef
Copy link
Contributor

serkef commented May 6, 2020

Okay, my latest error in this build process is this:

/opt/airflow/docs/_api/airflow/providers/apache/kafka/hooks/kafka_consumer_hook/index.rst:37: WARNING: Unexpected indentation.
/opt/airflow/docs/_api/airflow/providers/apache/kafka/sensors/kafka_sensor/index.rst:35: WARNING: Field list ends without a blank line; unexpected unindent.
looking for now-outdated files... none found
pickling environment... done
checking consistency... /opt/airflow/docs/_api/airflow/providers/apache/kafka/hooks/index.rst: WARNING: document isn't included in any toctree
/opt/airflow/docs/_api/airflow/providers/apache/kafka/sensors/index.rst: WARNING: document isn't included in any toctree
done

Does anyone know how to resolve an issue like this? @ashb? @kaxil?

@dferguson992 for the last 2 warnings, you need to add the new path in sphinx. Check the known issues in contributing.md
for the first 2 warnings, I'll add comments in the code where I think this fails. You test locally by building the docs.

@turbaszek
Copy link
Member

Not sure if this CI error about cyclic imports is related... @potiuk

@knackjason
Copy link

Did this ever get merged in? I'd be interested in having officially supported Kafka hooks.

@potiuk
Copy link
Member

potiuk commented Aug 16, 2021

Does not look like

@potiuk
Copy link
Member

potiuk commented Aug 16, 2021

But there is this PR in progress: #12388

@knackjason
Copy link

But there is this PR in progress: #12388

Ah, I didn't realize there was a newer PR. Thanks, @potiuk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.