Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker can't do impersonation while enable global async #13378

Closed
3 tasks done
juneauwang opened this issue Mar 1, 2021 · 6 comments · Fixed by #13878
Closed
3 tasks done

Worker can't do impersonation while enable global async #13378

juneauwang opened this issue Mar 1, 2021 · 6 comments · Fixed by #13878
Assignees
Labels
assigned:preset Assigned to the Preset team #bug Bug report global:async-query Related to Async Queries feature

Comments

@juneauwang
Copy link

juneauwang commented Mar 1, 2021

We enabled impersonation on Presto, Hive, impala.... etc sources and enabled global async in superset 1.0.0. However, worker can't get effective_username and queries will be run as system user

Expected results

Dashboard query can be run as actual login user of superset when datasource enabled impersonation.

Actual results

Dashboard will be run as system user no matter impersonation is on in datasource.

Screenshots

object user_name is None
[2021-03-01 07:04:27,745: INFO/ForkPoolWorker-127] object user_name is None
effective_username is None
[2021-03-01 07:04:27,745: INFO/ForkPoolWorker-127] effective_username is None
df username is None
[2021-03-01 07:04:27,746: INFO/ForkPoolWorker-127] df username is None
[2021-03-01 07:04:27,746: INFO/ForkPoolWorker-127] username is svc_acc_bdp_superset
[2021-03-01 07:04:27,746: INFO/ForkPoolWorker-127] SELECT "db_name" AS "db_name",
       "event_name" AS "event_name",
       sum(nb_events) AS "count"
FROM xxx
WHERE "event_date" >= '2021-02-01 00:00:00.000000'
  AND "event_date" < '2021-03-01 00:00:00.000000'
GROUP BY "db_name",
         "event_name"
LIMIT 500

I added logger in superset/models/core.py line 289, line 301 to debug. svc_acc_bdp_superset is a linux user which run celery workers

How to reproduce the bug

  1. Go to 'Dashboard'
  2. Click on any dashboards which datasource is impersonation enabled.
  3. Check worker log to see query user
  4. See error

Environment

(please complete the following information):

Superset 1.0.0
Python 3.8.2
Flask 1.1.2
PyHive 0.6.2
celery 4.4.7

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

stacktraces:

Traceback (most recent call last):
  File "/srv/python3/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 1321, in query
    df = self.database.get_df(sql, self.schema, mutator)
  File "/srv/python3/lib/python3.8/site-packages/superset/models/core.py", line 392, in get_df
    data = self.db_engine_spec.fetch_data(cursor)
  File "/srv/python3/lib/python3.8/site-packages/superset/db_engine_specs/base.py", line 321, in fetch_data
    return cursor.fetchall()
  File "/srv/python3/lib/python3.8/site-packages/pyhive/common.py", line 136, in fetchall
    return list(iter(self.fetchone, None))
  File "/srv/python3/lib/python3.8/site-packages/pyhive/common.py", line 105, in fetchone
    self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
  File "/srv/python3/lib/python3.8/site-packages/pyhive/common.py", line 45, in _fetch_while
    self._fetch_more()
  File "/srv/python3/lib/python3.8/site-packages/pyhive/presto.py", line 264, in _fetch_more
    self._process_response(self._requests_session.get(self._nextUri, **self._requests_kwargs, verify=False ,auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL)))
  File "/srv/python3/lib/python3.8/site-packages/pyhive/presto.py", line 303, in _process_response
    raise DatabaseError(response_json['error'])

worker systemctl:

[Unit]
Description=Superset workers
After=network-online.target
Wants=network-online.target

[Service]
Environment=PATH=/srv/python3/bin:/usr/bin:/bin:$PATH
Group=superset
User=svc_acc_bdp_superset
Type=simple
ExecStart=/bin/sh -c 'superset worker -w 200 >> /var/log/superset/worker.log 2>&1'
LimitNOFILE=65535
LimitNOFILESoft=65535
[Install]
WantedBy=multi-user.target
@juneauwang juneauwang added the #bug Bug report label Mar 1, 2021
@juneauwang
Copy link
Author

I noticed that username should be get by superset/utils/core/get_username() line 1308.
Which is trying to get username from flask context. However, celery worker can't get username. which means this method will return None.
Anyway to get current session username? or login username?

@juneauwang
Copy link
Author

Tested with webserver, it's possible to get username:

flask username is wpwang
INFO:superset.models.core:flask username is wpwang
INFO:pyhive.presto:SELECT 1
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): xxxxx:7778
DEBUG:urllib3.connectionpool:https://xxxxx:7778 "POST /v1/statement HTTP/1.1" 401 12

can it be possible that celery worker didn't register in flask context?

@matejmurin01
Copy link

matejmurin01 commented Mar 1, 2021

Same happened to us while GLOBAL_ASYNC_QUERIES is turned on. We are running Impala with impersonation, and the problem seemed to be that when function get_sqla_engine() in models/core.py is called, self.impersonate_user is True, but the argument user_name is None. The username is then attempted to get extracted from get_effective_username() in models/core.py, but that fails because: g.user is not an attribute of g (AttributeError: '_AppCtxGlobals' object has no attribute 'user').

@junlincc junlincc added the global:async-query Related to Async Queries feature label Mar 2, 2021
@robdiciuccio robdiciuccio self-assigned this Mar 3, 2021
@robdiciuccio robdiciuccio added the assigned:preset Assigned to the Preset team label Mar 3, 2021
@juneauwang
Copy link
Author

Hi @robdiciuccio @gorcurek , I might have a workaround to fix this. Inspired by https://stackoverflow.com/questions/21138025/attributeerror-appctxglobals-object-has-no-attribute-user-in-flask and errors from gorcurek , I changed model/core.py with:

++      line: 56   from flask_login.utils import current_user
++      line: 278     @app.before_request
++      line:  279     def before_request():
++      line:  280         g.user = current_user

I will test more next Monday and appreciate if you can have a test too also and share your opinions. Thanks!

@juneauwang
Copy link
Author

I tested but not fixing this issue. Will try something else or wait for official fix.

@robdiciuccio
Copy link
Member

@benjreinhart is working on a fix for this currently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned:preset Assigned to the Preset team #bug Bug report global:async-query Related to Async Queries feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants