Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidStateStoreException specific for 1.3.x versions #1241

Closed
petolexa opened this issue Feb 21, 2021 · 4 comments
Closed

InvalidStateStoreException specific for 1.3.x versions #1241

petolexa opened this issue Feb 21, 2021 · 4 comments
Assignees
Labels
type/bug Something isn't working

Comments

@petolexa
Copy link

Hello,

for Kafka Streams in version 1.3.2.Final, if I specify 2 replicas in my deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: apicurioregistry-test
  name: apicurioregistry-test
  namespace: schema-registry-test-01-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: apicurioregistry-test
  template:
    metadata:
      labels:
        app: apicurioregistry-test
    spec:
      containers:
        - image: nexus3.something.com:PORT/service-registry/apicurio/apicurio-registry-streams:1.3.2.Final
...

I get InvalidStateStoreException in UI:

org.apache.kafka.streams.errors.InvalidStateStoreException: The state store, storage-store, may have migrated to another instance.
	at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:64)
	at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1183)
	at io.apicurio.registry.utils.streams.distore.KeyValueStoreGrpcImplLocalDispatcher.lambda$keyValueStore$0(KeyValueStoreGrpcImplLocalDispatcher.java:57)
	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
	at io.apicurio.registry.utils.streams.distore.KeyValueStoreGrpcImplLocalDispatcher.keyValueStore(KeyValueStoreGrpcImplLocalDispatcher.java:51)
...

In versions 1.2.3.Final and 2.0.0-snapshot it works okay.

From the description I understand what the exception is supposed to mean, but both my pods/replicas are running OK and both KafkaStream instances are in RUNNING state.

@petolexa
Copy link
Author

I found two workarounds for this, where the exception disappears and Apicurio works fine:

  1. either set up 3 replicas in my deployment:
spec:
  replicas: 3
  selector:
    matchLabels:
      app: apicurioregistry-test
  1. or set up JAVA_OPTIONS with -Dregistry.streams.topology.num.standby.replicas=0
env:
  - name: JAVA_OPTIONS
     value: "-D%prod.registry.streams.topology.storage.topic=test-schema-registry-storage-topic -D%prod.registry.streams.topology.global.id.topic=test-schema-registry-global-id-topic -Djava.util.logging.manager=org.jboss.logmanager.LogManager  -Dregistry.streams.topology.num.standby.replicas=0"

I think, it is connected with "If you configure n standby replicas, you need to provision n+1 KafkaStreams instances.".
But with Apicurio default registry.streams.topology.num.standby.replicas=0, that I found in application.properties, and with 2 replicas that i want to use I don't think I am breaking the rule of n standby replicas and n+1 instances.

Finally, my question:
Could you please advise, which option is more appropriate from 2 workarounds described above? Having zero standby-replicas, or having 3 k8s pod replicas even if I do not need 3?
Or do you see a gap in my approach and there is an ellegant solution that I do not see?

@EricWittmann EricWittmann added the type/bug Something isn't working label Feb 22, 2021
@EricWittmann
Copy link
Member

@alesj what do you think?

@EricWittmann
Copy link
Member

@tombentley or @carlesarnal - any thoughts on this? I don't have any insights.

@petolexa
Copy link
Author

petolexa commented Mar 2, 2021

Thank you for your interest @EricWittmann.

What I noticed is, that the error is the same if I use registry.streams.topology.num.standby.replicas=2 and 3 pod replicas in deployment :)

Nevertheless, if there are no preferences of how to work around it, we will use registry.streams.topology.num.standby.replicas=1 and pod 3 replicas as a workaround and we can live with it until next final release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants