Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

celery SQLAlchemy errors in celery tasks that use sqlalchemy pools #10530

Closed
3 tasks done
JasonD28 opened this issue Aug 6, 2020 · 10 comments · Fixed by #10819
Closed
3 tasks done

celery SQLAlchemy errors in celery tasks that use sqlalchemy pools #10530

JasonD28 opened this issue Aug 6, 2020 · 10 comments · Fixed by #10819
Assignees
Labels
assigned:preset Assigned to the Preset team !deprecated-label:bug Deprecated label - Use #bug instead

Comments

@JasonD28
Copy link
Contributor

JasonD28 commented Aug 6, 2020

While running multiple alert.run_query celery tasks, some tasks fail with a sqlalchemy.exc.NoSuchColumnError error here https://github.com/apache/incubator-superset/blob/ece91928a9339190163c0bc72b96e51217a90d1e/superset/tasks/schedules.py#L544 on the first few iterations of the celery beat schedule after its initialization. These errors eventually stop occurring if the celery worker and the beat schedule are left to run for a few iterations without stopping. Tests have shown that the errors stop occurring on average after around 2 iterations of the beat schedule.

Expected results

Alerting celery tasks run without error

Actual results

Traceback (most recent call last):
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/thirdparty/superset_python3/pip/celery/celery-cpython-38/lib/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/thirdparty/superset_python3/superset-cpython-38/lib/superset/app.py", line 115, in __call__
    return task_base.__call__(self, *args, **kwargs)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/thirdparty/superset_python3/pip/celery/celery-cpython-38/lib/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/thirdparty/superset_python3/superset-cpython-38/lib/superset/tasks/schedules.py", line 562, in schedule_alert_query
    if run_alert_query(
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/thirdparty/superset_python3/superset-cpython-38/lib/superset/tasks/schedules.py", line 672, in run_alert_query
    deliver_alert(alert_id)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/thirdparty/superset_python3/superset-cpython-38/lib/superset/tasks/schedules.py", line 579, in deliver_alert
    alert = scoped_session.query(Alert).get(alert_id)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/query.py", line 1021, in get
    return self._get_impl(ident, loading.load_on_pk_identity)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/query.py", line 1138, in _get_impl
    return db_load_fn(self, primary_key_identity)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/loading.py", line 287, in load_on_pk_identity
    return q.one()
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/query.py", line 3458, in one
    ret = self.one_or_none()
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/query.py", line 3427, in one_or_none
    ret = list(self)
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/loading.py", line 101, in instances
    cursor.close()
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/loading.py", line 81, in instances
    rows = [proc(row) for row in fetch]
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/loading.py", line 81, in <listcomp>
    rows = [proc(row) for row in fetch]
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/loading.py", line 525, in _instance
    tuple([row[column] for column in pk_cols]),
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/orm/loading.py", line 525, in <listcomp>
    tuple([row[column] for column in pk_cols]),
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/engine/result.py", line 681, in _key_fallback
    util.raise_(
  File "/srv/yaps/mounts/superset/superset_build/bbe899d629f9d61e5d2a8ffdccbcc57724a557ea-e230e418e5960bc45d735cb21d806661/celery.runfiles/__main__/pip/SQLAlchemy/SQLAlchemy-cpython-38/lib/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column 'alerts.id'"

How to reproduce the bug

  1. install geckodriver
  2. set up config.py to enable alerting
ENABLE_ALERTS = True
FEATURE_FLAGS = {"THUMBNAILS": True}

THUMBNAIL_SELENIUM_USER = "Admin"
celery_alerting = {'alerts.schedule_check': {
        'task': 'alerts.schedule_check',
        'schedule': crontab(minute='*/1'),
    }}
CeleryConfig.CELERYBEAT_SCHEDULE.update(celery_alerting)
CELERY_CONFIG = CeleryConfig
  1. Go to Manage -> Alerts
  2. Create a few alerts (~6) that will run at the same time
  3. Start a celery worker and celery beat
  4. See error in celery worker

Environment

  • superset version: 0.37
  • python version: 3.7.7

Checklist

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.
@JasonD28 JasonD28 added the !deprecated-label:bug Deprecated label - Use #bug instead label Aug 6, 2020
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label #bug to this issue, with a confidence of 0.97. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@bkyryliuk
Copy link
Member

It seems that celery is not happy to deal with the sqlalchemy pool & that's related to the process forking. The recommendation I found was to have 1 session per worker to avoid issues like this. Here is some read:

cc @willbarrett @villebro

@bkyryliuk bkyryliuk self-assigned this Sep 9, 2020
@bkyryliuk bkyryliuk changed the title periodic SQLAlchemy errors with SQL-based alerting tasks celery SQLAlchemy errors with SQL-based alerting tasks Sep 10, 2020
@bkyryliuk bkyryliuk changed the title celery SQLAlchemy errors with SQL-based alerting tasks celery SQLAlchemy errors in celery tasks that use sqlalchemy pools Oct 15, 2020
@bkyryliuk
Copy link
Member

@nytai , @dpgaspar looks like this issue persists in the thumbnail caching tasks as well - just have tried it on staging
It's possible to follow #10819 approach to fix that.

@bkyryliuk bkyryliuk reopened this Oct 15, 2020
@stale
Copy link

stale bot commented Dec 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Dec 25, 2020
@cdmikechen
Copy link
Contributor

@bkyryliuk If I use

FEATURE_FLAGS = {
    "THUMBNAILS": True
}

superset will report error:

[2021-01-21 13:30:47,554: INFO/ForkPoolWorker-1] Processing url for thumbnail: 85e35616c1758c065a4fc7445f084f9a
Init selenium driver
[2021-01-21 13:30:47,555: INFO/ForkPoolWorker-1] Init selenium driver
Failed at generating thumbnail Instance <User at 0x112e39250> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/13/bhk3)
[2021-01-21 13:30:55,431: ERROR/ForkPoolWorker-1] Failed at generating thumbnail Instance <User at 0x112e39250> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/13/bhk3)

Is this problem related to this PR #10819 ?

@stale stale bot removed the inactive Inactive for >= 30 days label Jan 21, 2021
@korpa
Copy link

korpa commented Jan 23, 2021

@bkyryliuk If I use

FEATURE_FLAGS = {
    "THUMBNAILS": True
}

superset will report error:

[2021-01-21 13:30:47,554: INFO/ForkPoolWorker-1] Processing url for thumbnail: 85e35616c1758c065a4fc7445f084f9a
Init selenium driver
[2021-01-21 13:30:47,555: INFO/ForkPoolWorker-1] Init selenium driver
Failed at generating thumbnail Instance <User at 0x112e39250> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/13/bhk3)
[2021-01-21 13:30:55,431: ERROR/ForkPoolWorker-1] Failed at generating thumbnail Instance <User at 0x112e39250> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/13/bhk3)

Is this problem related to this PR #10819 ?

I have the same issue with thumbnails. Would be nice to know, if this is a bug or if I have a wrong thumbnail configuration.

@cdmikechen
Copy link
Contributor

cdmikechen commented Jan 24, 2021

@korpa
Copy link

korpa commented Jan 24, 2021

@cdmikechen Thank you very much, will test this later

@robdiciuccio
Copy link
Member

I'm working on an app-level fix for this based on https://www.yangster.ca/post/not-the-same-pre-fork-worker-model/. This should also resolve #12766

@robdiciuccio
Copy link
Member

#13350 contains a potential fix for this issue. Can you please test and confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned:preset Assigned to the Preset team !deprecated-label:bug Deprecated label - Use #bug instead
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants