-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Failed to deserialize a token: Timeout was reached" - how to debug? #128
Comments
In #114 the "Timeout was reached" error was caused when trying to retrieve the issuer public key, so the first thing I'd check is that curl can get your issuer's well-known endpoint from the machine in question. For example:
|
Using What's the second thing to check? (The token file is 1048 bytes in size, if that matters.) I'm about to run Wireshark on that machine to get a grip on what's being sent where (and meant to be received). |
I got a (large) token from
The
I see the issuer keys in the keycache:
I'm not sure what else to check. I think |
Hi all, |
Hi, Don't forget there are two different downloads: one for the metadata, the second for the public key. Right now, those seem to point to the same IP address (but maybe there's an implementation detail I can't see?). @steve8x8 - can you try both of these commands:
(if there's nothing private in the output, attaching them as a file would be useful) Now, if the networking setup looks clear, I know that @djw8605 was merging some PRs around bugs in the latest version... maybe there's some undefined behavior triggering on one box but not the other? Brian |
I have looked a little into another timeout that @mambelli was experiencing. In his case he was using a strange test network setup that ended up not being able to get a response from the DNS the first couple of tries, each of which timed out after 2 seconds. We found out that the scitokens timeout was only 4 seconds, which @djw8605 confirmed. That issue was worked around by updating /etc/resolv.conf to prioritize a different DNS which could respond right away. |
Wow, what a lot of responses ;-) Let me work my way through them. @jbasney I've kept
I was wondering which pipe would be involved here, but perhaps that's a red herring. @maarten-litmaath This seems to happen after accessing the keycache - which (of course, like all cluster users' homes) is on NFS. @bbockelm Both commands seem to succeed:
and
@DrDaveD I'll look into that - indeed the node in question is behind yet another gateway, which may have an effect on response times. Would there be a way to increase the timeout? Some of you have already guessed that I'm trying to use
Somehow I'm not getting the knack of local files (the ones in Pardon me for being incredibly stupid! |
There is not currently a way to increase the timeout, it is a compiled-in limit only. That would probably be a good feature request, however.
Yes the cilogon.org/igwn token issuer is configured to generate tokens that expire in 3 hours.
The
I'm not familiar with the
I believe you are confused about what the scitokens keycache is. It is only used for validating the signatures on JSON Web Tokens (JWTs -- scitokens are JWTs), so it caches the public keys of the token issuers. It does not cache individual JWTs. There is not good documentation on this stuff that I can point you to, unfortunately. However I can tell you the best way to debug CVMFS access with scitokens is to set
The file in /tmp is a vault token. That provides authenticated access to the vault server. The file in |
@DrDaveD thanks for the explanation (and the confirmation of lack of docs)! I found that the failing node was using a rather long list of name servers in |
Now that "everything seems to work" (iow, the small issue has been fixed to make room for the next, possibly bigger, one), I'm wondering -
|
You are correct that there is a network connection needed to verify the first token, but after that the verifications should be able to be done without the network using the local key cache. I don't think that is a significant issue, however. Network is required for so many different things, this is a very small network access, and there are so many different ways to DoS grid workflows that I don't think that adding one more really changes anything. |
related to #97 |
In addition to what @DrDaveD mentions, I'd point out the public keys don't necessarily need to be hosted at the same location as the issuer itself. It's common to see issuers use a hosting or CDN service (e.g., CloudFlare) that are going to be fairly impervious to a DDoS. The tradeoff here is the system is much closer to a "fail safe" than a "fail open". In case of a trust root compromise, trust in the keys can be pulled centrally; after a fixed amount of time, you've automatically revoked all signed credentials. |
To summarize, the timeout was caused by a DNS server not responding - it takes ten seconds to move on, but a second server, although listed in /etc/resolv.conf, wasn't considered anymore. I think a 4-second timeout is too short; 15 seconds would at least allow a second DNS to jump in. But I've got to believe the current timeout had been chosen intentionally, so I've got to keep this in mind. (It would have been helpful if the error message would have told me what exactly was timing out...) |
How to debug this? Debian Bullseye, libscitokens0 version 1.0.2
There's another machine with the same SciTokens setup that returns the correct response (same file content)
but I can't find the difference :/
The text was updated successfully, but these errors were encountered: