Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry to connect to hive metastore with new delegation token. #10561

Conversation

Sunwoo-Shin
Copy link
Contributor

Fixes #10534

@cla-bot
Copy link

cla-bot bot commented Jan 12, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

@Sunwoo-Shin
Copy link
Contributor Author

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

CLA had sent to [email protected]
( Sent Email: [email protected] )

throws TException
{
if (refresh) {
delegationTokenCache.refresh(username);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refresh might be async (under what conditions?). it should be simpler to invalidate, but this would be vulnerable to #10512 / google/guava#1881.

cc @kokosing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not use refresh. It makes it hard to reason about the current state. Notice that we don't know if below getUnchecked would return new or old value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the link below, it was confirmed that refresh function works as async. Therefore, it has been modified from refresh function to put function.

}

ThriftMetastoreClient client = createMetastoreClient();
setMetastoreUserOrClose(client, username);
return client;
}

private ThriftMetastoreClient createMetastoreClientWithAuthentication(String username, boolean refresh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract:

 private String getDelegationToken(String username)
    {
        String delegationToken;
        try {
            delegationToken = delegationTokenCache.getUnchecked(username);
        }
        catch (UncheckedExecutionException e) {
            throwIfInstanceOf(e.getCause(), TrinoException.class);
            throw e;
        }
        return delegationToken;
    }

and inline this back:

        if (authenticationEnabled) {
            try {
                return clientProvider.createMetastoreClient(Optional.of(getDelegationToken(username)));
            }
            catch (TException e) {
                log.debug(e, "It may be a delegation token permission failure, so try again with a new delegation token");
                delegationTokenCache.refresh(username);
                return clientProvider.createMetastoreClient(Optional.of(getDelegationToken(username)));
            }
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified in that way. please check again.

@cla-bot
Copy link

cla-bot bot commented Jan 12, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

@ebyhr
Copy link
Member

ebyhr commented Apr 18, 2022

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Apr 18, 2022
@cla-bot
Copy link

cla-bot bot commented Apr 18, 2022

The cla-bot has been summoned, and re-checked this pull request!

@mosabua
Copy link
Member

mosabua commented Nov 3, 2022

👋 @Sunwoo-Shin - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that. As a first step you would need to rebase.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@Sunwoo-Shin Sunwoo-Shin force-pushed the feature/retry-connection-with-new-delegation-token branch from f90f01d to 24ad7e0 Compare November 4, 2022 04:23
@Sunwoo-Shin Sunwoo-Shin force-pushed the feature/retry-connection-with-new-delegation-token branch 2 times, most recently from 7bc1d7f to eeb767c Compare November 4, 2022 04:27
@Sunwoo-Shin
Copy link
Contributor Author

@mosabua
Thanks for noticing. I was missing this PR.
now rebase is complete.
It was confirmed that the logic was changed from the ThriftHiveMetastore class to the TokenFetchingMetastoreClientFactory class, and I modified it accordingly.

cc. @findepi

return clientProvider.createMetastoreClient(Optional.of(getDelegationToken(username)));
}
catch (TException e) {
log.debug(e, "It may be a delegation token permission failure, so try again with a new delegation token");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need this log here. Trino will not be run with debug log level enabled anyway and the log message does not provide any actionable information..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for giving feedbacks :)
I also agree with what you said. this log has been removed.

delegationToken = delegationTokenCache.getUnchecked(username);
return clientProvider.createMetastoreClient(Optional.of(getDelegationToken(username)));
}
catch (TException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be look deeper into exception to determine if regenerating delegatin token can really be helpful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exception is occurring as below.

Trino Error (402 version)

스크린샷 2022-11-08 오후 12 29 12

Hive Metastore Error (3.1.2 version)

스크린샷 2022-11-08 오후 12 32 48

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe sth like that

    @Override
    public ThriftMetastoreClient createMetastoreClientFor(Optional<ConnectorIdentity> identity)
            throws TException
    {
        if (!impersonationEnabled) {
            return createMetastoreClient();
        }

        String username = identity.map(ConnectorIdentity::getUser)
                .orElseThrow(() -> new IllegalStateException("End-user name should exist when metastore impersonation is enabled"));

        try {
            return clientProvider.createMetastoreClient(Optional.of(getDelegationToken(username)));
        }
        catch (TException e) {
            if (isDelegationTokenExpiration(e)) {
                // try once again with new token
                String delegationToken = loadDelegationToken(username);
                delegationTokenCache.put(username, delegationToken);
                return clientProvider.createMetastoreClient(Optional.of(delegationToken));
            }
            throw e;
        }
    }

    private boolean isDelegationTokenExpiration(Throwable exception)
    {
        if (exception instanceof SecretManager.InvalidToken) {
            return true;
        }
        if (exception.getCause() == null || exception.getCause() == exception) {
            return false;
        }
        return (isDelegationTokenExpiration(exception.getCause()));
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the message passed to the thrift response cannot be identified as SecretManager.InvalidToken because it is transmitted only with the following string information.

스크린샷 2022-11-09 오전 1 50 43

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - I misread your previous message. The "cause" does not make it through the "TTransportException". :/

Copy link
Member

@losipiuk losipiuk Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - let's go with that. Just please add a comment in catch clause why are you reloading the token.

Also please add a code which ensures that we are not refreshing the token "too often", I think ensuring at most 1 refresh per minute will be fine. We do not want to start hammering metastore which returns errors, because it is overloaded with twice as many requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just please add a comment in catch clause why are you reloading the token.

Added a comment to that.

Also please add a code which ensures that we are not refreshing the token "too often", I think ensuring at most 1 refresh per minute will be fine. We do not want to start hammering metastore which returns errors, because it is overloaded with twice as many requests.

I also positively agree with that.
Since tokens are generated for each user and each expires, the refresh period check is implemented to be checked for each user.

catch (TException e) {
log.debug(e, "It may be a delegation token permission failure, so try again with a new delegation token");
delegationTokenCache.put(username, loadDelegationToken(username));
return clientProvider.createMetastoreClient(Optional.of(getDelegationToken(username)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can store the delegation token obtained from loadDelegationToken(username)) on variable; put it in the cache and then use it here without going through cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for giving feedbacks :)
changed it to be reused as a local variable.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits and make commit message:

Regenerate Hive delegation token when expired

@Sunwoo-Shin Sunwoo-Shin force-pushed the feature/retry-connection-with-new-delegation-token branch from ad8fcfb to 4715b43 Compare November 8, 2022 17:00
@Sunwoo-Shin
Copy link
Contributor Author

@losipiuk

Please squash commits and make commit message:
Regenerate Hive delegation token when expired

squash commit and modified it with the written message.

@Sunwoo-Shin Sunwoo-Shin force-pushed the feature/retry-connection-with-new-delegation-token branch 2 times, most recently from 010cbf6 to 53fb610 Compare November 14, 2022 01:20
Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@losipiuk losipiuk force-pushed the feature/retry-connection-with-new-delegation-token branch from 53fb610 to 43473d3 Compare November 14, 2022 09:50
@losipiuk
Copy link
Member

THanks.
I squashed the commits. Made record definition private, and added null check. WIll merge when CI passes.

@Chaho12
Copy link
Member

Chaho12 commented Nov 25, 2022

CI failed :'(

@ebyhr
Copy link
Member

ebyhr commented Nov 25, 2022

Both failures are unrelated to this change as far as I confirmed. #14814 & #13288

@Sunwoo-Shin Sunwoo-Shin force-pushed the feature/retry-connection-with-new-delegation-token branch from 43473d3 to b99d643 Compare December 20, 2022 06:33
@losipiuk
Copy link
Member

THanks.
I squashed the commits. Made record definition private, and added null check. WIll merge when CI passes.

And I totally forgot 🤦 . Sorry

@losipiuk losipiuk merged commit 3a5e287 into trinodb:master Dec 20, 2022
@github-actions github-actions bot added this to the 404 milestone Dec 20, 2022
@colebow
Copy link
Member

colebow commented Dec 21, 2022

@Sunwoo-Shin @losipiuk does this need a release note?

@losipiuk
Copy link
Member

@Sunwoo-Shin @losipiuk does this need a release note?

@colebow maybe sth like:

Refresh the Hive metastore delegation token to prevent query failures if the metastore was restarted mid-query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Suggestions : Hive Metastore Delegation-Token Authentication Failure
9 participants