Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(celery): Reset DB connection pools for forked worker processes #13350

Merged
merged 5 commits into from
Feb 26, 2021

Conversation

robdiciuccio
Copy link
Member

@robdiciuccio robdiciuccio commented Feb 25, 2021

SUMMARY

Adds a listener for the worker_process_init Celery signal that disposes of and resets the SQLAlchemy connection pool being passed to the forked process.

Resolves the intermittent sqlalchemy.exc.NoSuchColumnError reported in #10530 and #12766 and the following error reported in #9860:

sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq

This fix is primarily related to the default prefork Celery execution pool, but was also tested with the following pool invocations:

celery worker --app=superset.tasks.celery_app:app -Ofair -l INFO
celery worker --app=superset.tasks.celery_app:app --pool=threads -c 12 -l INFO
celery worker --app=superset.tasks.celery_app:app --pool=gevent -c 12 -l INFO

This configuration was tested with async queries enabled to place load on the celery workers, in both standalone and Docker-based workflows.

This PR also includes a fix for a client-side race condition in loading charts asynchronously (fixes #12913).

References:
https://docs.sqlalchemy.org/en/13/core/connections.html#engine-disposal
https://www.yangster.ca/post/not-the-same-pre-fork-worker-model/

TEST PLAN

Asynchronous tasks should run without sqlalchemy.exc.NoSuchColumnError when celery is run in prefork mode. See #10530 and #12766 for reproducability.

ADDITIONAL INFORMATION

Copy link
Member

@craig-rueda craig-rueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov
Copy link

codecov bot commented Feb 26, 2021

Codecov Report

Merging #13350 (a28c406) into master (7055c05) will decrease coverage by 0.01%.
The diff coverage is 25.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13350      +/-   ##
==========================================
- Coverage   77.12%   77.10%   -0.02%     
==========================================
  Files         881      881              
  Lines       45502    45507       +5     
  Branches     5447     5449       +2     
==========================================
- Hits        35093    35090       -3     
- Misses      10286    10293       +7     
- Partials      123      124       +1     
Flag Coverage Δ
cypress 58.06% <0.00%> (+0.02%) ⬆️
hive 79.95% <0.00%> (-0.02%) ⬇️
javascript 62.55% <37.50%> (+0.11%) ⬆️
mysql 80.28% <0.00%> (-0.02%) ⬇️
postgres 80.32% <0.00%> (-0.02%) ⬇️
presto ?
python 80.71% <0.00%> (-0.14%) ⬇️
sqlite 79.94% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset-frontend/src/dashboard/index.jsx 63.63% <0.00%> (-6.37%) ⬇️
superset-frontend/src/explore/index.jsx 63.63% <0.00%> (-6.37%) ⬇️
superset-frontend/src/profile/App.tsx 0.00% <0.00%> (ø)
superset/tasks/celery_app.py 0.00% <0.00%> (ø)
...rset-frontend/src/components/TableLoader/index.tsx 100.00% <100.00%> (ø)
superset-frontend/src/middleware/asyncEvent.ts 87.34% <100.00%> (ø)
superset/db_engine_specs/presto.py 82.47% <0.00%> (-5.56%) ⬇️
superset/models/core.py 88.55% <0.00%> (-0.28%) ⬇️
...et-frontend/src/components/TableView/TableView.tsx 92.85% <0.00%> (+3.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7055c05...7ca4d05. Read the comment docs.

Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! but does this happen even when using null pool on the workers? or do you think there's still uses of QueuePool create by flask-sqlalchemy at the worker level?

@robdiciuccio
Copy link
Member Author

@dpgaspar the app.db instance doesn't appear to use the null pool with the metadata database. I know we're using that elsewhere to get around this forking issue, which I'm planning on reviewing after this fix is merged.

@dpgaspar
Copy link
Member

@dpgaspar the app.db instance doesn't appear to use the null pool with the metadata database. I know we're using that elsewhere to get around this forking issue, which I'm planning on reviewing after this fix is merged.

workers when accessing the metadata db should use: https://github.com/apache/superset/blob/master/superset/utils/celery.py#L33

Let's talk about this?

@robdiciuccio
Copy link
Member Author

Talked with @dpgaspar and we agree this is a good first step, and more DB connection pool investigation is required. Merging.

@robdiciuccio robdiciuccio merged commit b4ca39c into apache:master Feb 26, 2021
@robdiciuccio robdiciuccio deleted the rd/celery-db-pool branch February 26, 2021 16:05
betodealmeida pushed a commit to betodealmeida/incubator-superset that referenced this pull request Mar 19, 2021
…pache#13350)

* Reset sqlalchemy connection pool on celery process fork

* Fix race condition with async chart loading state

* pylint: ignore

* prettier
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
…pache#13350)

* Reset sqlalchemy connection pool on celery process fork

* Fix race condition with async chart loading state

* pylint: ignore

* prettier
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset-io size/S v1.1 🚢 1.2.0
Projects
None yet
4 participants