Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV CryptographyClient.getKeys 401s with Token not yet valid #26624

Closed
garantir-km opened this issue Jan 21, 2022 · 5 comments
Closed

KV CryptographyClient.getKeys 401s with Token not yet valid #26624

garantir-km opened this issue Jan 21, 2022 · 5 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. KeyVault question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@garantir-km
Copy link

garantir-km commented Jan 21, 2022

Describe the bug
Everything has been working fine in our environment and then all of a sudden at 2022-01-20 07:46:05 (UTC) we started getting 401 errors when making Key Vault requests (key retrieval and sign operations). The error message was always "Token not yet valid (Activity ID: XXXX-YYY-ZZZ)". The error showed up intermittently but pretty frequently for about 12 hours. We made no changes to our code and we have been successfully using this code in this environment (Azure Managed HSM) for a few weeks now. The computers where we ran the tests all had NTP configured and as soon as we detected the error we checked the date/time and everything looked fine.

As I write this right now, I am not currently able to reproduce the issue anymore, even though we have made no changes to the code, runtime, or to any system clocks.. However, we, along with our customers, are concerned for the stability of this in production and would love an explanation so that we can sleep comfortably and know what to do if this happens again in the future.

Exception or Stack Trace
com.azure.core.exception.HttpResponseException: Status code 401, "{"error":{"code":"Unauthorized","message":"Token not yet valid (Activity ID: XXXX-YYY-ZZZZZ)"}}"
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.azure.core.http.rest.RestProxy.instantiateUnexpectedException(RestProxy.java:389)
at com.azure.core.http.rest.RestProxy.lambda$ensureExpectedStatus$7(RestProxy.java:444)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:125)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoCacheTime$CoordinatorSubscriber.signalCached(MonoCacheTime.java:337)
at reactor.core.publisher.MonoCacheTime$CoordinatorSubscriber.onNext(MonoCacheTime.java:354)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
at reactor.core.publisher.MonoCacheTime$CoordinatorSubscriber.onSubscribe(MonoCacheTime.java:293)
at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:192)
at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.MonoCacheTime.subscribeOrReturn(MonoCacheTime.java:143)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2194)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe(Operators.java:2068)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:151)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onNext(FluxRetryWhen.java:174)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.Operators$MonoInnerProducerBase.complete(Operators.java:2664)
at reactor.core.publisher.MonoSingle$SingleSubscriber.onComplete(MonoSingle.java:180)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onComplete(MonoFlatMapMany.java:260)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onComplete(FluxMapFuseable.java:150)
at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onComplete(FluxDoFinally.java:145)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onComplete(FluxMapFuseable.java:150)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1817)
at reactor.core.publisher.MonoCollect$CollectSubscriber.onComplete(MonoCollect.java:159)
at reactor.core.publisher.FluxHandle$HandleSubscriber.onComplete(FluxHandle.java:213)
at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onComplete(FluxMap.java:269)
at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:400)
at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:419)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:473)
at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:684)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:93)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1372)
at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1246)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1286)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Exception: #block terminated with an error
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99)
at reactor.core.publisher.Mono.block(Mono.java:1706)
at com.azure.security.keyvault.keys.cryptography.CryptographyClient.getKeyWithResponse(CryptographyClient.java:91)
at com.azure.security.keyvault.keys.cryptography.CryptographyClient.getKey(CryptographyClient.java:69)
at ...

To Reproduce
Any Key Vault call via the Java SDK would produce the result for us. Our Managed HSM is deployed in Canada and the servers that we used to test are located in California and Virginia.

Code Snippet

TokenCredential credential = new ClientSecretCredentialBuilder()
	.tenantId(tenantID)
	.clientId(clientID)
	.clientSecret(new String(secret))
	.build();

KeyClient keyClient = new KeyClientBuilder()
	.vaultUrl(vaultURL)
	.credential(credential)
	.buildClient();

CryptographyClient cryptoClient = new CryptographyClientBuilder()
    .credential(credential)
    .keyIdentifier(keyID)
    .buildClient();

KeyVaultKey key = cryptoClient.getkey(); // exception thrown here

Expected behavior
Cryptographic operation succeeds as it normally does. In the example given here, that means the key is retrieved successfully.

Setup (please complete the following information):

  • OS: Windows
  • IDE: Eclipse but we get the error if we run from the command line, from within Tomcat, or from within the IDE
  • Library/Libraries: com.azure:azure-security-keyvault-keys:4.3.5, com.azure:azure-identity:1.4.1
  • Java version: 8
  • App Server/Environment: Tomcat
  • Frameworks: None

Additional context
We are using a Managed HSM in Canada. Our servers that run our code are located in California and Virginia.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [X ] Bug Description Added
  • [ X] Repro Steps Added
  • [ X] Setup information Added
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 21, 2022
@joshfree joshfree added Client This issue points to a problem in the data-plane of the library. KeyVault labels Jan 21, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 21, 2022
@joshfree
Copy link
Member

@vcolin7 could you please assist @garantir-km with this github issue?

@joshfree joshfree changed the title Token not yet valid KV CryptographyClient.getKeys 401s with Token not yet valid Jan 21, 2022
@garantir-km
Copy link
Author

Checking back on this. Anybody know what the issue could have been?

@vcolin7
Copy link
Member

vcolin7 commented Jan 29, 2022

Hi @garantir-km, sorry for the late response. I searched around for a bit and found a few instances of similar issues such as this one, this one and this one.

The common factor in all these was that the time on the machines was ahead of the time on the servers, which caused the token that was generated to not be valid yet. However, those issues were using software other than the Azure SDK (such as the Azure CLI). I ran some tests using the latest version of our Azure Key Vault Keys and Azure Identity libraries and could not reproduce this error after changing the time on my local machine, so I don't think being out of sync could have been the issue.

It looks like when using a ClientSecretCredential the Azure Identity library sends a request it does not include any timestamp information, so no matter what changes I made to my local time the response always contained a token that was valid 5 minutes before the time the request was sent. I suspect that the date and time of the authentication server is what determines the moment from which the token is valid (//cc @g2vinay in case I misinterpreted something about the Azure Identity library).

Maybe it's possible that for a brief window the auth server was not in sync with the time on the Key Vault? I am not an expert in Azure infrastructure, so I will have to bring this up with some folks internally to find answers. I am inclined to say that the issue probably does not lie with the Azure SDK, but some other component.

Just in case, would you mind sharing what version of com.azure:azure-security-keyvault-keys and com.azure:azure-identity you were using at the time?

@garantir-km
Copy link
Author

Hi @vcolin7, thank you for the details. We did our own quick analysis of the code and came to the same conclusion that the timestamp of our server does not get used during token generation. Like you, we suspect that Azure's authorization server is generating the timestamp and there was possibly some time sync issue between the Key Vault / Managed HSM server that processes the request and the authorization server that generated the token.

Interestingly, the same token would sometimes work and sometimes fail. Our server uses that same token in a multi-threaded environment and most requests succeeded. Unless the SDK is smart enough to regenerate the token when this error occurs, I believe all those requests used the same token.

The versions of the SDK we are using are:
com.azure:azure-security-keyvault-keys version 4.3.5
com.azure:azure-identity version 1.4.1

We look forward to hearing what your internal folks have to say about this issue.

@vcolin7
Copy link
Member

vcolin7 commented Feb 10, 2022

Talked to other folks working on the Key Vault Azure SDK and they also think it most likely was some slight de-sync between servers. Without having the full logs or at least the request IDs I don't think there's much that can be done to figure out the exact details of what happened. Maybe this is something you should route to your Microsoft support contact to have a support engineer look into it further. I will be closing this as it is not an issue on the Azure SDK end, but the service's.

@vcolin7 vcolin7 closed this as completed Feb 10, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. KeyVault question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

3 participants