Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: won't start, no logs indicating why #1455

Closed
asmith60 opened this issue Aug 23, 2019 · 5 comments · Fixed by #1952
Closed

store: won't start, no logs indicating why #1455

asmith60 opened this issue Aug 23, 2019 · 5 comments · Fixed by #1952

Comments

@asmith60
Copy link

Thanos, Prometheus and Golang version used
Thanos: 0.6.0
Prometheus: 2.10.0

What happened
The Thanos store won't start. It tries to start up, but crashes in ~30 seconds. Inspecting the pod indicates that the process exited with a non-zero code. The log output with debug enabled is below.

level=info ts=2019-08-23T18:55:30.952906789Z caller=main.go:154 msg="Tracing will be disabled"
level=info ts=2019-08-23T18:55:30.952955736Z caller=factory.go:39 msg="loading bucket configuration"
level=info ts=2019-08-23T18:55:30.969576127Z caller=cache.go:172 msg="created index cache" maxItemSizeBytes=4294967296 maxSizeBytes=8589934592 maxItems=math.MaxInt64
level=debug ts=2019-08-23T18:55:30.969822743Z caller=store.go:144 msg="initializing bucket store"

What you expected to happen
Thanos store to start successfully.

Anything else we need to know
6 HA pairs of Prometheus instances (12 total instances) are uploading metrics to the AWS S3 bucket. The current bucket size is ~750GB. The store pod manifest is below (I removed the obj-store config, AWS IAM config, etc)

Store pod

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: monitoring
  labels:
    app: thanos-store
spec:
  replicas: 3
  selector:
    matchLabels:
      app: thanos-store
  serviceName: thanos-store
  template:
    metadata:
      labels:
        app: thanos-store
    spec:
      containers:
        - name: thanos-store
          imagePullPolicy: Always
          image: "improbable/thanos:v0.6.0"
          args:
            - store
            - --data-dir=/data
            - --log.level=debug
            - --index-cache-size=8GB
            - --chunk-pool-size=20GB
          ports:
            - name: http
              containerPort: 10902
              protocol: TCP
            - name: grpc
              containerPort: 10901
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /metrics
              port: http
          readinessProbe:
            httpGet:
              path: /metrics
              port: http
          resources:
            limits:
              cpu: 2000m
              memory: 32000Mi
            requests:
              cpu: 2000m
              memory: 32000Mi
          volumeMounts:
            - mountPath: /data
              name: storage-volume
  volumeClaimTemplates:
    - metadata:
        name: storage-volume
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: "128Gi"

@anoop2503
Copy link

Hi, any update on this issue?

I am also getting the same issue that thanos store gateway is stuck with "initializing bucket store" when starting the container. No other warning/error is appearing in the log. Any idea why this is happening or how to find out the root cause of this issue?

The logs are given below:
level=info ts=2019-09-05T14:37:53.221491945Z caller=flags.go:75 msg="gossip is disabled" level=info ts=2019-09-05T14:37:53.222294564Z caller=factory.go:39 msg="loading bucket configuration" level=debug ts=2019-09-05T14:37:53.223374047Z caller=store.go:128 msg="initializing bucket store"

Thanks,

@bwplotka
Copy link
Member

bwplotka commented Sep 5, 2019

Sorry for delay!

Store Gateway Startup grabs portion of the objects into memory and thus if you don't have compactor (do you have it? Is it working?) it will be quite a long process, plus memory intensive.

Most likely Store is just OOMing for your case. Give more memory, time shard store gateway (see: #1077), or add compactor if missing (!).

Things which we are planning to do:

@asmith60
Copy link
Author

@anoop2503 I just needed to give the store more time to startup (about 5 minutes in my case). It seems that the more memory I feed the store the less time it takes to start.

@GiedriusS
Copy link
Member

Also, we could and should probably be more verbose here at the debug level (or info) so that users would know what blocks we are pulling just like Prometheus, for example, prints what blocks it finds on the disk.

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@stale stale bot closed this as completed Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants