Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TableauRefreshWorkbookOperator fails when using personal access token (Tableau authentication method) #16669

Closed
pmalafosse opened this issue Jun 25, 2021 · 7 comments · Fixed by #16916
Assignees

Comments

@pmalafosse
Copy link
Contributor

pmalafosse commented Jun 25, 2021

Apache Airflow version: 2.0.1

What happened:

The operator fails at the last step, after successfully refreshing the workbook with this error:

tableauserverclient.server.endpoint.exceptions.ServerResponseError: 

        401002: Unauthorized Access
                Invalid authentication credentials were provided.

What you expected to happen:
It should not fail, like when we use the username/password authentication method (instead of personal_access_token)

Tableau server does not allow concurrent connections when using personal_access_token tableau/server-client-python#717
The solution would be redesigning completely the operator to only call the hook once.
My quick fix was to edit this in TableauHook:

    def __exit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
        pass

How to reproduce it:

Run this operator TableauRefreshWorkbookOperator using Tableau personal_access_token authentication (token_name, personal_access_token).

@pmalafosse pmalafosse added the kind:bug This is a clearly a bug label Jun 25, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 25, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@pmalafosse pmalafosse changed the title TableauRefreshWorkbookOperator fails when using personal access token TableauRefreshWorkbookOperator fails when using personal access token (Tableau authentication method) Jun 25, 2021
@samgans
Copy link
Contributor

samgans commented Jul 5, 2021

Hey! I can make an investigation on this issue and prepare the first PR! Are you ok with that?

@pmalafosse
Copy link
Contributor Author

yes, sure @samgans ! Thanks

@samgans
Copy link
Contributor

samgans commented Jul 7, 2021

Hello again!

I've made a basic investigation on it and found that we actually have a bit weird situation. The thing is that if you are authenticating your Tableau connection with a personal access token, each time you will try to open a parallel connection, the previous one will be invalidated. You can read more about this in the thread: tableau/server-client-python#717.

Speaking of our issue and how it is related to the Tableau personal token: in the "blocking" task, we are using the TableauJobStatusSensor that opens another connection, therefore, invalidating the previous one causing the wrapping TableauHook to fail (you can check more in the execute method of TableauRefreshWorkbookOperator in  tableau_refresh_workbook.py).

So, we have two options here:

  1. As the blocking process uses an explicit sensor to wait till the job is finished (and the sensor opens another connection), we can make the Sensor configurable to use the existing connection (which is TableauHook). So, we can pass it as a parameter and forget about the symptom.

BUT! It seems like if the user will add one or more Tableau task which will use the same token and run in parallel, we can get some hard-to-debug errors due to parallel invalidation of tokens. Therefore, it seems to me that we shouldn't use this approach and take a look at the next one.

  1. Just deprecate the authentication by the personal token and indicate this clearly in the documentation. This will remove all kinds of bugs related to the invalidation of access tokens and will not deceive the users about the operability of the execution of the parallel tasks with a personal token.

What do you think?

@pmalafosse
Copy link
Contributor Author

Great, thanks a lot for investigating this @samgans !

On my side I created a new 'airflow' user in Tableau with 'Site Administrator Explorer' role and will use the username/password authentication then.
I think it makes sense to deprecate the personal token authentication for the TableauRefreshWorkbookOperator and adding some comments about this issue in the code as it was not even working properly now.

Could this be a possible 3rd solution though?
Opening and closing the connection each time we want to call Tableau server and adding some retry strategy to make sure we cover the cases when several authentications are made at the same time.

@samgans
Copy link
Contributor

samgans commented Jul 8, 2021

@pmalafosse,

The deprecation will not be related directly to TableauRefreshWorkbookOperator but will be done rather for TableauHook, where the method of authentication using the personal token is implemented.

Speaking of the third solution, I've been thinking about some kind of strategy of reusing the existing connection by TableauHook (e.g. saving it in the global variable which will be available for all instances), but here is my big doubt:

I know exactly that if we will reuse a connection from a global variable, all will be good if we will have multiple related tasks in one DAG. But if we will have some other Tableau tasks with the same personal token in other DAGs? It seems to me that all the DAG definitions are executed at different processes so we can't guarantee that there will be access to one global variable between all the DAG definitions.

We also can't check if there are multiple authentication attempts using the Tableau server client, so it seems to me that deprecation is the best option. Also, if you have some ideas on how the third solution can be implemented, I'm open to trying this out.

@pmalafosse
Copy link
Contributor Author

thanks @samgans !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants