Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA handling for store nodes #199

Closed
fabxc opened this issue Jan 31, 2018 · 7 comments
Closed

HA handling for store nodes #199

fabxc opened this issue Jan 31, 2018 · 7 comments

Comments

@fabxc
Copy link
Collaborator

fabxc commented Jan 31, 2018

Store nodes are currently generally run as a single replica. It's not super critical to have HA in general since several hours or even days of recent data are HA via the Prometheus servers. But for some scenarios it might still be preferable.

Two could simply be deployed and the query node would take care of deduplication/merging just like for Prometheus HA pairs. But unlike Prometheus servers, the underlying data is truly the same in this case and fetching twice the amount is unnecessary overhead.

Some simple logic could be added to the query node to recognize real duplicates (Prometheus HA pairs are actually different through a replica label) and to only query one of them.

@bwplotka
Copy link
Member

bwplotka commented Feb 7, 2018

I think we might need that sooner than later... (: How can we do it easily? Basically we need to tell querier that these X stores are the same thing.. Can we reuse labels field from store Info endpoint?

@dupondje
Copy link

The most basic way would just be the option to add for example --bucketid="xxx" to the storage command.
If the query command notices 2 (or more) buckets with the same ID, it could just take a random one to get its data from instead of all of them.

@deejay1
Copy link
Contributor

deejay1 commented May 16, 2018

For active/passive this could be done using a leader latch protocol and sharing the data downloaded by the leader as it could announce any new downloaded bucket via gossip (for a faster failover) and share it via HTTP/gRPC. This would eliminate the need to fetch the data from an object store directly and allow for the query nodes to have only a single source of truth (the current leader)

@mattbostock
Copy link
Contributor

mattbostock commented May 31, 2018

I'd like to volunteer to take this on. For our use case, downtime caused by the store instance fronting an S3 bucket being rescheduled to another machine is not really palatable.

I'm thinking of an active-active solution, since it avoids some of the complexities around deciding which instance is 'active' and would be more efficient with resources. As store nodes are essentially just caches, I think it should reasonable straightforward to achieve.

While thinking about high availability, we should also consider allowing the store nodes to scale horizontally for very large deployments, effectively allowing horizontal scaling the LRU cache of indices.

I propose:

  • We shard the index cache across multiple store instances.
  • Optionally, we replicate the shards to provide high availability for a single shard - though by having multiple shards, we can already improve overall availability and reduce the time to recovery.

Just an idea: If we have multiple shards, we might simplify the store instances by avoiding persisting the cache to disk, since the amount of data to pull from object storage would be reduced by 1/n where n is the number of shards when restarting a singe instance (though recovery would be slow if all instances are restarted).

@bwplotka
Copy link
Member

bwplotka commented Jun 1, 2018

@mattbostock Thanks!

It all works for one assumption: Thanos setup has only bucket to take data from, are we ok with it? I have seen some use cases for multiple buckets connected to same Thanos "cluster/network/setup", because "it is easier to manage", "my object storage is specific" etc. Maybe that's separate issue, but woth to be aware of this while implementing HA.

We shard the index cache across multiple store instances.

Makes sense, just I would love to hear/see more about the implementation details. As you suggested offline: https://godoc.org/github.com/golang/groupcache sound nice but it means that we are talking about sharding fully on stores (you ask whatever store and it gives you correct answer 100% time even if it needs to ask its peers) or maybe we want thanos-query to be aware of store sharding? Also are we are talking about sharding index cache based on... what? On matchers 0.o? __name__ only? what if someone asks for __name__~=.*?

though by having multiple shards, we can already improve overall availability and reduce the time to recovery.

Totally agree and thanks for example 👍 However, I would start from something simple first - just replicating (so true HA), because that is what you need (from you what you say). This will enable horizontal scaling (will offload single store) and potentially improve performance as well. Just sharding will ONLY improve the availability (but will still have some major disruption time), regarding the performance it is hard to say without #346 (which is in progress).

@mattbostock
Copy link
Contributor

mattbostock commented Jul 4, 2018

Added a proposal for high-availability for store instances here: #404

@bwplotka
Copy link
Member

This can be solved by just by running multiple of Store Gateways behind any Loadbalancer (like Kuberentes Service) and without gossip.

fpetkovski added a commit to fpetkovski/thanos that referenced this issue Jan 26, 2024
…hanos-io#199)

* Replace summary in extprom metrics with histogram (thanos-io#6327)

* Replaced summary in extprom metrics with histogram

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Added changelog

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Removed unused parameters from NewInstrumentationMiddleware

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Reverted NewInstrumentationMiddleware

Signed-off-by: Sebastian Rabenhorst <[email protected]>

---------

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Avoid expensive log.Valuer evaluation for disallowed levels (thanos-io#6322)

Signed-off-by: Xiaochao Dong (@damnever) <[email protected]>

* Fix inconsistent error for series limits in Store API (thanos-io#6330)

* store: fix inconsistent error for series limits

Signed-off-by: Thibault Mange <[email protected]>

* update changelog

Signed-off-by: Thibault Mange <[email protected]>

* Update pkg/store/bucket.go

Co-authored-by: Saswata Mukherjee <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>

* Update pkg/store/bucket.go

Co-authored-by: Saswata Mukherjee <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>

* rename labelValues serires liimiter test function

Signed-off-by: Thibault Mange <[email protected]>

---------

Signed-off-by: Thibault Mange <[email protected]>
Co-authored-by: Saswata Mukherjee <[email protected]>

* *: remove unmaintained gzip library (thanos-io#6332)

Switch from nytimes gzip library to the klaustpost's gzip code. The old
gzip HTTP handler shows up a lot in allocs so that's how I ended up
doing this change.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Traces sampler env var (thanos-io#6306)

* Issue#5947 OTEL_TRACES_SAMPLER env var

Signed-off-by: shayyxi <[email protected]>

* Test correction

Signed-off-by: shayyxi <[email protected]>

* doc failure correction. parse float argument correction.

Signed-off-by: shayyxi <[email protected]>

* added the changelog.

Signed-off-by: shayyxi <[email protected]>

* ran make docs to fix the build failure.

Signed-off-by: shayyxi <[email protected]>

* corrected the incorrect change in tools.md

Signed-off-by: shayyxi <[email protected]>

* fixed review comments.

Signed-off-by: shayyxi <[email protected]>

---------

Signed-off-by: shayyxi <[email protected]>
Signed-off-by: Shazi <[email protected]>
Co-authored-by: shayyxi <[email protected]>

* query: use storepb.SeriesServer (thanos-io#6334)

Use storepb.SeriesServer instead of the concrete struct. This allows
implementing functionality on top of the proxy.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* cacheutil: upgrade `rueidis` to v1.0.2 to improve error handling while shrinking a redis cluster. redis/rueidis#209 (thanos-io#6342)

* use github.com/onsi/gomega/gleak to detect goroutine leak with timeout

Signed-off-by: Rueian <[email protected]>

* fix: spelling errors DoInSpanWtihErr to DoInSpanWithErr (thanos-io#6345)

Signed-off-by: aimuz <[email protected]>

* Return grpc code resource exhausted for byte limit error (thanos-io#6325)

* return grpc code resource exhausted for byte limit error

Signed-off-by: Ben Ye <[email protected]>

* fix lint

Signed-off-by: Ben Ye <[email protected]>

* update partial response strategy

Signed-off-by: Ben Ye <[email protected]>

* fix limit

Signed-off-by: Ben Ye <[email protected]>

* try to fix tests

Signed-off-by: Ben Ye <[email protected]>

* fix test error message

Signed-off-by: Ben Ye <[email protected]>

* fix test

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>

* Expose info for each TSDB

This commit exposes the label set alongside the min and max time
for each TSDB covered by a Store.

This information is used to scope the min time for a remote query
so that we do not produce partial aggregates in distriuted mode.

Signed-off-by: Filip Petkovski <[email protected]>

* Add test case for proxy store

Signed-off-by: Filip Petkovski <[email protected]>

* Bump promql-engine to fix thanos-io/promql-engine#239 (thanos-io#6349)

Signed-off-by: Alban HURTAUD <[email protected]>

* Updates busybox SHA (thanos-io#6365)

Signed-off-by: GitHub <[email protected]>
Co-authored-by: fpetkovski <[email protected]>

* Query: Add +Inf bucket to query duration metrics  (thanos-io#6358)

* Query: Add +Inf bucket to query duration metrics

For the query duration metrics
(`thanos_store_api_query_duration_seconds`), we record query respond
latency, based on the size of the query (samples/series), and save to a
histogram.

However, when a query is made which exceeds the biggest sample/serie
size, we would prior to this commit, put the request into the largest
bucket.

With this commit, we instead create an `+Inf` bucket, and put requests
which are larger than the biggest defined bucket into that. This gives
more accurate results, and also allow one to see if the bucket sizes are
incorrectly sized.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Tests: Mutex around non-thread safe random source

When creating test blocks, we use a non-thread safe random source, in
multiple goroutines. Due to this, tests would sometime panic.

This commits puts a mutex around calls using the same source, in order
to avoid this.

This should hopefully improve reliability of e2e tests.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

---------

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* e2e(query): Reproduce dedup issue from thanos-io#6257

Signed-off-by: Douglas Camata <[email protected]>

* Add dedup e2e test for Receive

With internal and external labels support.

Signed-off-by: Douglas Camata <[email protected]>

* Simplify generated blocks for query test

Signed-off-by: Douglas Camata <[email protected]>

* Improve query dedup test

Signed-off-by: Douglas Camata <[email protected]>

* Write a query test for dedup with sidecar

Signed-off-by: Douglas Camata <[email protected]>

* Refactor query dedup test with sidecar

Signed-off-by: Douglas Camata <[email protected]>

* Fix Receive query test

Now it properly ensures the double dedup works (on internal and external labels).

Signed-off-by: Douglas Camata <[email protected]>

* Fix receive drawing

Signed-off-by: Douglas Camata <[email protected]>

* Add one extra test caes for query dedup from store

Signed-off-by: Douglas Camata <[email protected]>

* Complement test for Receive query with dedup

Signed-off-by: Douglas Camata <[email protected]>

* Complement test for Sidecar query dedup

Signed-off-by: Douglas Camata <[email protected]>

* Expected failure of block label query dedup tests

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Check context when expanding postings (thanos-io#6363)

* check context when expanding postings

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>

* ui: only keep name in store_matches param (thanos-io#6371)

We are doing store matching on the `name` field hence only keep that
field in the URL because otherwise the URL could get quite lengthy with
external labelsets inside of it.

Besides unit tests, I have also tested locally:
- Enable store filtering;
- Select store(-s);
- Copy/paste URL into the new tab and see that the same stores are
  loaded like expected;
- See that URL only has names in them.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* docs: replace --store with --endpoint

Replace deprecated `--store` with `--endpoint` in docs.

Signed-off-by: Paul Gier <[email protected]>

* Optimizing "grafana generated" regex matchers (thanos-io#6376)

* Opmizing Group Regex

Signed-off-by: Alan Protasio <[email protected]>

* fixing native histogram tests

Signed-off-by: Alan Protasio <[email protected]>

---------

Signed-off-by: Alan Protasio <[email protected]>

* Cache: various index cache client improvements (thanos-io#6374)

* Query Explanation (thanos-io#6346)

* Return Query Explaination in QueryAPI

A param `explain` is added to QueryAPI, if true then explanation
returned by the `Explain()` method of the query having structure
`ExplainOutputNode` is returned in response.
Query Explanation is added under new field in response that is
`thanosInfo`.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Add explain checkbox in thanos UI

A explain checkbox is added to Thanos Query UI, that requests for
query explanation from thanos query api.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Add ExpandableNode Component

ExpandableNode component renders Query Explanation in the thanos
UI. Requires a new package `react-accessible-treeview`.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Disable Explain checkbox on prometheus engine

Prometheus engine sends out error if toggle explain button. To
provide better experience, the explain checkbox get disbaled on
switching to prometheus engine and enable back on switching to
thanos engine.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Add alert box with horizontal scrolling for Explanation

Signed-off-by: Pradyumna Krishna <[email protected]>

* Remove ExpandableNode and Add ListTree

Updates the design for query explanation box, removes
`ExpandableNode` and the dependency. Builts a new `ListTree` that
does the same using reactstrap and custom css.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Minor refactor in Query API response

`thanosInfo` is removed from Query reponse and used `explanation`
directly. `disableCheckbox` is also renamed to
`disableExplainCheckbox` in thanos UI.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Update UI tests to passing

Signed-off-by: Pradyumna Krishna <[email protected]>

* Minor UI changes and test fix

UI improvements and Panel test fix other way around, resetting
the results on panel construction.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Update promql-engine to use Explain method

Signed-off-by: Pradyumna Krishna <[email protected]>

* Build UI assets

Build UI assets, that runs new thanos UI with explain button.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Revert proxy url change from package.json

`proxy` was accidently changed and committed with package.json
when removed dependency. Hence, reverting it back.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Minor changes in UI

Fix requested changes in UI.
- Rename `state` and `setState` to `mapping` and `setMapping`.
- Rename `NodeTree` to `QueryTree`.
- Use unicode characters instead of `-` and `+`.
- Fix blue box on explain button.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Update UI assets

Signed-off-by: Pradyumna Krishna <[email protected]>

---------

Signed-off-by: Pradyumna Krishna <[email protected]>

* Implementing Regex optimization on the `MatchNotRegexp` and `MatchNotEqual` matcher type (thanos-io#6379)

* Implementing Regex optimization on the MatchNotRegexp matcher type

Signed-off-by: Alan Protasio <[email protected]>

* Opmizing MatchNotEqual

Signed-off-by: Alan Protasio <[email protected]>

---------

Signed-off-by: Alan Protasio <[email protected]>

* Put back the correct makefile

Signed-off-by: Douglas Camata <[email protected]>

* Remove extra line that broke untouched test

Signed-off-by: Douglas Camata <[email protected]>

* Add back line break at end of makefile

Signed-off-by: Douglas Camata <[email protected]>

* Fix Receive single ingestor test

Signed-off-by: Douglas Camata <[email protected]>

* Reproduce dedup issue in Receive

Signed-off-by: Douglas Camata <[email protected]>

* Add even more test cases for dedup on store gw

Signed-off-by: Douglas Camata <[email protected]>

* Reproduce dedup bug in Sidecar

Signed-off-by: Douglas Camata <[email protected]>

* Reuse nginx image name

Signed-off-by: Douglas Camata <[email protected]>

* Let all users read the metrics file from static metrics server

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Reformat asciiflow chart

Signed-off-by: Douglas Camata <[email protected]>

* Reuse static metrics server from e2e framework

Signed-off-by: Douglas Camata <[email protected]>

* add de-cix as adopter (thanos-io#6386)

Signed-off-by: Raul Garcia Sanchez <[email protected]>

* [chore] Updating Query Engine and Prometheus (thanos-io#6392)

* Updating Query Engine

Signed-off-by: Alan Protasio <[email protected]>

* fix prometheus breaking change

Signed-off-by: Alan Protasio <[email protected]>

* Update prometheus with prometheus/prometheus#12387

Signed-off-by: Alan Protasio <[email protected]>

---------

Signed-off-by: Alan Protasio <[email protected]>

* Receive: Allow specifying tenant-specific external labels in RouterIngestor (thanos-io#5777)

Signed-off-by: haanhvu <[email protected]>

* check context cancel when doing posting batches (thanos-io#6396)

Signed-off-by: Ben Ye <[email protected]>

* Expose store gateway query stats in series response hints (thanos-io#6352)

* expose query stats hints

Signed-off-by: Ben Ye <[email protected]>

* update

Signed-off-by: Ben Ye <[email protected]>

* add query stats hints in result

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

* add merge method

Signed-off-by: Ben Ye <[email protected]>

* fix unit test

Signed-off-by: Ben Ye <[email protected]>

modify hints proto

Signed-off-by: Ben Ye <[email protected]>

fix unit test

Signed-off-by: Ben Ye <[email protected]>

update format

Signed-off-by: Ben Ye <[email protected]>

* update comments

Signed-off-by: Ben Ye <[email protected]>

* try again

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>

* receive: make az aware ketama hashring (thanos-io#6369)

* receive: make az aware ketama hashring

Signed-off-by: Alexander Rickardsson <[email protected]>

* receive: pass endpoints in hashring config as object

Signed-off-by: Michael Hoffmann <[email protected]>

* receive: add some tests for consistent hashing in presence of AZs

Signed-off-by: Michael Hoffmann <[email protected]>

* receive,docs: add migration note for az aware hashring

Signed-off-by: Michael Hoffmann <[email protected]>

---------

Signed-off-by: Alexander Rickardsson <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>

* Proposal: query path tenancy  (thanos-io#6320)

* Add 1st version of query path tenancy proposal

Signed-off-by: Douglas Camata <[email protected]>

* Update proposal after initial feedback

Signed-off-by: Douglas Camata <[email protected]>

* Add cool picture

Signed-off-by: Douglas Camata <[email protected]>

* Include example in cross tenant query complications

Signed-off-by: Douglas Camata <[email protected]>

* Improve reasoning for why not using the QFE

Signed-off-by: Douglas Camata <[email protected]>

* Improve writing in "How" section

Signed-off-by: Douglas Camata <[email protected]>

* Fix owner profile link

Signed-off-by: Douglas Camata <[email protected]>

* Apply suggestions from code review

Co-authored-by: Saswata Mukherjee <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>

* Address few more PR review comments

Signed-off-by: Douglas Camata <[email protected]>

* Address feedback on flag name text

Signed-off-by: Douglas Camata <[email protected]>

* Update diagram

Signed-off-by: Douglas Camata <[email protected]>

* Improve non-goals text

Signed-off-by: Douglas Camata <[email protected]>

* Update diagram

Signed-off-by: Douglas Camata <[email protected]>

* Update docs/proposals-accepted/202304-query-path-tenancy.md

Co-authored-by: Filip Petkovski <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>

* Clarify scenario for pitfalls of current solution

Signed-off-by: Douglas Camata <[email protected]>

* Clarify that Store doesn't care about tenant label

Signed-off-by: Douglas Camata <[email protected]>

* Add an action plan

Signed-off-by: Douglas Camata <[email protected]>

* Mention alternative idea of modifying Store API

Signed-off-by: Douglas Camata <[email protected]>

* Fix typo

Co-authored-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>

* Address lots of feedback on the proposal

Signed-off-by: Douglas Camata <[email protected]>

* Format query path tenancy proposal doc

Signed-off-by: Douglas Camata <[email protected]>

* Add a "Tenancy Model" subsection to "Goals"

Signed-off-by: Douglas Camata <[email protected]>

* Mention header semanthics in comparison with gRPC message field

Signed-off-by: Douglas Camata <[email protected]>

* Improve action plan structure and writing

Signed-off-by: Douglas Camata <[email protected]>

---------

Signed-off-by: Douglas Camata <[email protected]>
Co-authored-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Filip Petkovski <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>

* Fix double-counting bug in http_request_duration metric (thanos-io#6399)

* fix double-counting bug in http_request_duration metric

Signed-off-by: 4orty <[email protected]>

* Update Changelog

Signed-off-by: 4orty <[email protected]>

---------

Signed-off-by: 4orty <[email protected]>

* Updates busybox SHA (thanos-io#6403)

Signed-off-by: GitHub <[email protected]>
Co-authored-by: fpetkovski <[email protected]>

* Fix series stats merge (thanos-io#6408)

* fix series stats merge

Signed-off-by: Ben Ye <[email protected]>

* update license header

Signed-off-by: Ben Ye <[email protected]>

* use reflect

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>

* Receive: allow unlimited head_series_limit tenants (thanos-io#6406)

With this commit we now allow to configure tenants with unlimited active
series limit by setting the limit to `0`. Prior to this commit setting a
per tenant limit to `0` would cause the tenant to be unable to write any
metrics at all.

This fixes: thanos-io#6393

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* expose downloaded data size in query hints (thanos-io#6409)

Signed-off-by: Ben Ye <[email protected]>

* maintainers: add myself to triagers (thanos-io#6414)

Signed-off-by: Michael Hoffmann <[email protected]>

* Add `@douglascamata` to triagers (thanos-io#6418)

Signed-off-by: Douglas Camata <[email protected]>

* Add Blog (thanos-io#6411)

* Add LFX blog

Signed-off-by: Pradyumna Krishna <[email protected]>

* Add Headers to blog

Signed-off-by: Pradyumna Krishna <[email protected]>

* Lint blog

Signed-off-by: Pradyumna Krishna <[email protected]>

---------

Signed-off-by: Pradyumna Krishna <[email protected]>

* blog: Fix images for LFX post (thanos-io#6422)

* blog: Fix images for LFX post

Signed-off-by: Saswata Mukherjee <[email protected]>

* fix lint

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>

* Index Cache: Change cache key for postings (thanos-io#6405)

* extend postings cache key with codec

Signed-off-by: Ben Ye <[email protected]>

* add changelog

Signed-off-by: Ben Ye <[email protected]>

* update code back

Signed-off-by: Ben Ye <[email protected]>

* add colon

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

* fix another test

Signed-off-by: Ben Ye <[email protected]>

* add compression scheme const to remote index cache

Signed-off-by: Ben Ye <[email protected]>

* address required comments

Signed-off-by: Ben Ye <[email protected]>

* fix compression scheme name

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>

* Receive: upgrading logs for failed uploads to error (thanos-io#6427)

* FIX: upgrading log for failed upload to error

Signed-off-by: Victor Fernandes <[email protected]>

* docs: added changelog entry

Signed-off-by: Victor Fernandes <[email protected]>

---------

Signed-off-by: Victor Fernandes <[email protected]>

* fix postings test

Signed-off-by: Ben Ye <[email protected]>

* Add aiven as adopter... more soon! (thanos-io#6430)

Signed-off-by: Jonah Kowall <[email protected]>

* Report gRPC connnection errors to the caller (thanos-io#6428)

By default `grpc.DialContext()` is non-blocking so any connection issue
will not be surfaced to the user. This change makes it blocking and
configures the gRPC dialer to report the underlying error if any
happens.

Signed-off-by: Simon Pasquier <[email protected]>

* chore: remove duplicated `gopkg.in/fsnotify.v1` dep (thanos-io#6432)

* chore: remove duplicated `gopkg.in/fsnotify.v1` dep

`github.com/fsnotify/fsnotify` and `gopkg.in/fsnotify.v1` are the same
dependency. We can keep `github.com/fsnotify/fsnotify` and remove
`gopkg.in/fsnotify.v1`.

Signed-off-by: Eng Zer Jun <[email protected]>

* docs: add changelog

Signed-off-by: Eng Zer Jun <[email protected]>

---------

Signed-off-by: Eng Zer Jun <[email protected]>

* Expose estimated chunk and series size as configurable options (thanos-io#6426)

* expose estimated chunk and series size as configurable options

Signed-off-by: Ben Ye <[email protected]>

* fix lint

Signed-off-by: Ben Ye <[email protected]>

* fix test

Signed-off-by: Ben Ye <[email protected]>

* fix test

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>

* Receive: make tsdb stats limit configurable (thanos-io#6437)

* Receive: make tsdb stats limit configurable

Signed-off-by: Michael Hoffmann <[email protected]>

* Receive: make tsdb stats limit configurable

Signed-off-by: Michael Hoffmann <[email protected]>

---------

Signed-off-by: Michael Hoffmann <[email protected]>

* *: wire new Engine/Explain fields in query-frontend (thanos-io#6433)

- Pass Engine/Explain fields in query-frontend codecs
- Add Engine field to QFE cache key
- Add e2e tests for all cases

Signed-off-by: Giedrius Statkevičius <[email protected]>

* index cache: Cache expanded postings (thanos-io#6420)

* cache expanded postings in index cache

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

* fix

Signed-off-by: Ben Ye <[email protected]>

* fix lint

Signed-off-by: Ben Ye <[email protected]>

* rebase main and added compression name to key

Signed-off-by: Ben Ye <[email protected]>

* update key

Signed-off-by: Ben Ye <[email protected]>

* add e2e test for memcached

Signed-off-by: Ben Ye <[email protected]>

* fix cache config

Signed-off-by: Ben Ye <[email protected]>

* address review comments

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>

* add approximate series size to index stats (thanos-io#6425)

Signed-off-by: Ben Ye <[email protected]>

* index stats: fix chunk size calculation (thanos-io#6424)

Signed-off-by: Ben Ye <[email protected]>

* Remove some unused Cortex vendored code and metrics (thanos-io#6440)

* Fixed DefaultPromConfig

* Fixed imports

* Back to diffVarintSnappyEncode

* Merge pull request thanos-io#180 from Shopify/optimize-timerange-calculation

Cache calculated mint and maxt for each remote engine

* Updated busybox

* fixing lint

* Fixing merge conflict

Signed-off-by: Pedro Tanaka <[email protected]>

* Fixing missing import

Signed-off-by: Pedro Tanaka <[email protected]>

* fix lint again

Signed-off-by: Pedro Tanaka <[email protected]>

* resolving conflict merges

Signed-off-by: Pedro Tanaka <[email protected]>

* Fixed import and fn order

* Fixed unit tests

* Updated promdoc.sum

* Back to custom promql engine

* Removed custom promql engine and moved to latest upstream

* Ran go mod tidy

* Fixed GetQueryAPIClients

* Store: fix crash on empty regex matcher

Signed-off-by: Michael Hoffmann <[email protected]>

---------

Signed-off-by: Sebastian Rabenhorst <[email protected]>
Signed-off-by: Xiaochao Dong (@damnever) <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: shayyxi <[email protected]>
Signed-off-by: Shazi <[email protected]>
Signed-off-by: Rueian <[email protected]>
Signed-off-by: aimuz <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Alban HURTAUD <[email protected]>
Signed-off-by: GitHub <[email protected]>
Signed-off-by: Jacob Baungard Hansen <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Paul Gier <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: Pradyumna Krishna <[email protected]>
Signed-off-by: Raul Garcia Sanchez <[email protected]>
Signed-off-by: haanhvu <[email protected]>
Signed-off-by: Alexander Rickardsson <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: 4orty <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Signed-off-by: Victor Fernandes <[email protected]>
Signed-off-by: Jonah Kowall <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Eng Zer Jun <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>
Co-authored-by: Sebastian Rabenhorst <[email protected]>
Co-authored-by: Xiaochao Dong <[email protected]>
Co-authored-by: Thibault Mange <[email protected]>
Co-authored-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Shazi <[email protected]>
Co-authored-by: shayyxi <[email protected]>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: aimuz <[email protected]>
Co-authored-by: Ben Ye <[email protected]>
Co-authored-by: Filip Petkovski <[email protected]>
Co-authored-by: Alban Hurtaud <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: fpetkovski <[email protected]>
Co-authored-by: Jacob Baungård Hansen <[email protected]>
Co-authored-by: Douglas Camata <[email protected]>
Co-authored-by: Paul Gier <[email protected]>
Co-authored-by: Alan Protasio <[email protected]>
Co-authored-by: Pradyumna Krishna <[email protected]>
Co-authored-by: Raúl Garcia Sanchez <[email protected]>
Co-authored-by: Ha Anh Vu <[email protected]>
Co-authored-by: Alexander Rickardsson <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: Wonki Kim <[email protected]>
Co-authored-by: Michael Hoffmann <[email protected]>
Co-authored-by: Victor Hugo Brito Fernandes <[email protected]>
Co-authored-by: Jonah Kowall <[email protected]>
Co-authored-by: Simon Pasquier <[email protected]>
Co-authored-by: Eng Zer Jun <[email protected]>
Co-authored-by: Sebastian Rabenhorst <[email protected]>
fpetkovski pushed a commit to fpetkovski/thanos that referenced this issue Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants