-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClickHouseBranchSQLOperator: AirflowException, Invalid arguments #87
Comments
Changing the order of base classes in inheritance solved my issue from airflow.providers.common.sql.operators import sql
from airflow_clickhouse_plugin.operators.clickhouse_dbapi import ClickHouseBaseDbApiOperator
class ClickHouseBranchSQLOperator(
sql.BranchSQLOperator,
ClickHouseBaseDbApiOperator,
):
pass |
Hi @cra and thank you for reporting this! TL;DR The behaviour looks strange. Before proceeding with the change, I suggest understanding the issue first. Though I was able to reproduce it, this behaviour looks strange to me. This is what
Since the first 3 classes do not define I have also created a small code snippet to reproduce this classes hierarchy: # Airflow base operator
class BaseOperator(object):
def __init__(self, **kwargs):
print(f'BaseOperator.__init__ called: {kwargs=}')
if kwargs:
raise AssertionError(f'unprocessed {kwargs=}')
# Common SQL operators
class BaseSQLOperator(BaseOperator):
def __init__(self, **kwargs):
print(f'BaseSQLOperator.__init__ called: {kwargs=}')
super().__init__(**kwargs)
def get_db_hook(self):
print('BaseSQLOperator.get_db_hook called')
class BranchSQLOperator(BaseSQLOperator):
def __init__(self, follow_task_ids_if_true, **kwargs):
print(f'BranchSQLOperator.__init__ called: {kwargs=}')
super().__init__(**kwargs)
# ClickHouse base operators
class ClickHouseDbApiHookMixin(object):
def _get_clickhouse_db_api_hook(self):
print('ClickHouseDbApiHookMixin._get_clickhouse_db_api_hook called')
class ClickHouseBaseDbApiOperator(ClickHouseDbApiHookMixin, BaseSQLOperator):
def get_db_hook(self):
print('ClickHouseBaseDbApiOperator.get_db_hook called')
return self._get_clickhouse_db_api_hook()
# The target class
class ClickHouseBranchSQLOperator(ClickHouseBaseDbApiOperator, BranchSQLOperator):
pass
print(ClickHouseBranchSQLOperator.__mro__)
print('\ncalling ClickHouseBranchSQLOperator()')
operator = ClickHouseBranchSQLOperator(follow_task_ids_if_true=['task_1'])
print('\ncalling get_db_hook()')
operator.get_db_hook() And it works perfectly fine, here is the output:
Thank you for sharing your solution. But before we proceed with the code change, I would like to understand the behaviour. Because it might be not the plugin's issue. Do you have any clues of what happens in Airflow and why its behaviour changes from the regular Python MRO as shown in the above code snippet? Maybe the code snippet misses something significant differentiating from Airflow's implementation (some meta classes maybe, though I have checked them and spotted no crucial difference). When we change the class definition to
Which means that |
As a quicker option, you may also proceed with a PR. Ideally, please start a PR with tests only. They should fail for the reported case. Once we confirm the tests fail, you may proceed with the code change fixing it by |
Here is another solution: class ClickHouseDbApiHookMixin(object):
# these attributes are defined in both BaseSQLOperator and SqlSensor
conn_id: str
hook_params: t.Optional[dict]
+ def __init__(self, **kwargs) -> None:
+ super().__init__(**kwargs) Apparently just adding I think I know what's going on. You're right, something is funky, so the first thing I checked is the metaclass And here is the line that breaks everything: If you don't have
So when you run your class, the method that actually gets called is I opened an issue in airflow: apache/airflow#41085 |
Hi @grihabor thank you for your proactive participation! And the PR to Airflow in particular 🔥 For my better understanding, please confirm:
But does not it mean that because Or does Just trying to gather some keywords for me to know which concepts I have to refresh in memory 😅 |
I agree that a quick fix would be adding the |
Correct.
The thing is, when you create an instance of the
Nope. It calls the next-in-the-MRO class after BaseSQLOperator which is BaseOperator. Here is a small example: class InitMeta(type):
def __new__(cls, name, bases, namespace, **kwargs):
new_cls = super().__new__(cls, name, bases, namespace, **kwargs)
new_cls.__init__ = new_cls.__init__
return new_cls
class A(metaclass=InitMeta):
def __init__(self):
print("A", super())
super().__init__()
class B(A):
pass
class C(A):
def __init__(self):
print("C", super())
super().__init__()
class D(B, C):
pass
D() The output is
So in the class A the super call is equivalent to
Yep. |
Could you add a warning or mark supported versions in the table then? |
Mentioned in README ✅ |
Hello!
I've been using the regular ClickHouseOperator for my DAGs for a while now and I noticed that you support DB API 2.0 so I tried to using ClickHouseBranchSQLOperator and ran into an issue
It seems to match the way the BranchSQLOperator is used but I get ImportError when trying to use this task:
Could you please provide an example of how this operator is supposed to be used?
I'm probably missing something silly, but cannot figure out what
The text was updated successfully, but these errors were encountered: