Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull moar fixes #63

Merged
merged 42 commits into from
Jul 27, 2023
Merged

Pull moar fixes #63

merged 42 commits into from
Jul 27, 2023

Conversation

GiedriusS
Copy link
Collaborator

@GiedriusS GiedriusS commented Jul 27, 2023

Pull a bunch of things from main into our fork. Improved performance, query-frontend uses rueidis, etc.

yeya24 and others added 30 commits July 27, 2023 11:55
* use own explanation struct

Signed-off-by: Ben Ye <[email protected]>

* omit empty

Signed-off-by: Ben Ye <[email protected]>

* fix e2e test

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
Instead of allocating bytes for raw postings, let's read them directly
into diff varint format to save memory.

Signed-off-by: Giedrius Statkevičius <[email protected]>
* compactor set index stats on block

Signed-off-by: Ben Ye <[email protected]>

* comment

Signed-off-by: Ben Ye <[email protected]>

* change field

Signed-off-by: Ben Ye <[email protected]>

* update downsample

Signed-off-by: Ben Ye <[email protected]>

* add index stats into compaction

Signed-off-by: Ben Ye <[email protected]>

* fix tests

Signed-off-by: Ben Ye <[email protected]>

* fix test

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
Compaction metrics have too high a cardinality, causing metric bloat on
large installations. The group information is better suited to logs.
* Replace with a `resolution` label to the compaction counters.

Fixes: thanos-io#5841

Signed-off-by: SuperQ <[email protected]>
Signed-off-by: Michael Hoffmann <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
The distributed optimizer in the Thanos engine calls MinT() and LabelSets multiple times
on each remote engine. These calculations can get expensive when an engine
covers a large amount of TSDBs.

This commit introduces caching of the calculated values for a remote
engine's mint, maxt and label sets.

Signed-off-by: Filip Petkovski <[email protected]>
* go.mod: bump rest of otel libs

Bump these two OTEL libs to be like the rest of the OTEL libs. We had a
strange issue where `service.name` was not set even though it was in
environment variables. Bumping these libs fixes the problem.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* otlp: only set service.name if provided

Signed-off-by: Giedrius Statkevičius <[email protected]>

* e2e: add test case

Signed-off-by: Giedrius Statkevičius <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Pooling hides the real allocations in postings benchmarks because the
allocation only happens once really. In reality this is different when
multiple readers are created at once. Disable pooling in benchmarks to
understand what is happening.

Signed-off-by: Giedrius Statkevičius <[email protected]>
* check context when expanding postings

Signed-off-by: Ben Ye <[email protected]>

* import

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
* fix wrong metric names for series and postings fetch duration

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

* note breaking change

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
The latest improvements have great perf improvements. Let's update it.
Thank you to [rueian](https://github.com/rueian) and all of the
contributors to that awesome Redis client!

Signed-off-by: Giedrius Statkevičius <[email protected]>
This commit introduces a tenancy package in preparation for introduction
of tenancy in the query path. Tenancy related code which can be common
between components are moved out of the receive component into the
tenancy package.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
Optimize snappy streamed reading by traversing through the byte slice in
advance to determine how big of a buffer we will need. I have opted to
rewrite snappy streamed format decoding because it is a straightforward
format. Complex parts were deferred to klauspost/compress.

Signed-off-by: Giedrius Statkevičius <[email protected]>
* Remove Exists call in meta fetcher

The meta fetcher lists the bucket by iterating through
top level paths from the entire keyspace, and then for each key
calls Exists on the meta.json file. This leads to an N+1 amplification
against object storage, manifesting as a very high increase in costs
and occasional throttling by cloud providers.

This commit changes the iteration logic to recursively list all files instead,
which can be done with far fewer API calls to object storage. Meta files
are then identified from the keys returned by the list operation instead
of making one API request from each file.

Signed-off-by: Filip Petkovski <[email protected]>

* Revert e2e test change

Signed-off-by: Filip Petkovski <[email protected]>

---------

Signed-off-by: Filip Petkovski <[email protected]>
Consistently use the same timestamp instead of the current time each
time during a test.

Signed-off-by: Giedrius Statkevičius <[email protected]>
* chore: pkg imported more than once

Signed-off-by: guoguangwu <[email protected]>

* fix: remove pkg alias

Signed-off-by: guoguangwu <[email protected]>

---------

Signed-off-by: guoguangwu <[email protected]>
… and improve its tests (thanos-io#6496)

* Fix and improve PathContentReloader tests

Signed-off-by: Douglas Camata <[email protected]>

* Run tests in parallel

Signed-off-by: Douglas Camata <[email protected]>

* Add changelog entry

Signed-off-by: Douglas Camata <[email protected]>

* Fix table tests

Signed-off-by: Douglas Camata <[email protected]>

* Fix typo

Co-authored-by: Filip Petkovski <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

---------

Signed-off-by: Douglas Camata <[email protected]>
Co-authored-by: Filip Petkovski <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
* Added receive float histogram support

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed imports

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Added comments for fns copied from Prometheus

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Improved comment

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Removed unnecessary if

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Fixed native histogram proto conversion in remote engine

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Removed unused histogram conversion from remote engine

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Fix and renaming in native_histograms_test.go

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Trigger Build

Signed-off-by: Sebastian Rabenhorst <[email protected]>

---------

Signed-off-by: Sebastian Rabenhorst <[email protected]>
Bump e2e framework version and add a note about E2E_DOCKER_CPUS.

Signed-off-by: Giedrius Statkevičius <[email protected]>
* cortex/redisclient: use rueidis client

Use the same rueidis client in query-frontend. Solves
thanos-io#6094.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* cortex/cache: gofumpt + fix errors.Errorf

Signed-off-by: Giedrius Statkevičius <[email protected]>

* cacheutil/docs: clean up more old stuff

Signed-off-by: Giedrius Statkevičius <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
* index header: remove memWriter from fileWriter

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

* refactor

Signed-off-by: Ben Ye <[email protected]>

* fix test

Signed-off-by: Ben Ye <[email protected]>

* update comment

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
* Wrap object store Prometheus registery

In preparation for the work being done at thanos-io/objstore#26.

Signed-off-by: Douglas Camata <[email protected]>

* Update objstore to the latest version

This version removes the `thanos_` prefix from metrics, which is the
reason for wrapping the objstore's metrics registry in the first place.

Signed-off-by: Douglas Camata <[email protected]>

* Fix modules

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Wrap metrics registerer for objstore bucket

Signed-off-by: Douglas Camata <[email protected]>

* Remove prefix from Thanos Store metrics

Signed-off-by: Douglas Camata <[email protected]>

* Fix goimports

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Put back upgraded objstore dep

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Move to more recent ref of thanos/objstore

Signed-off-by: Douglas Camata <[email protected]>

* Ignore OBS in objstore tests

Signed-off-by: Douglas Camata <[email protected]>

* Fix linting error on main test after objstore upgrade

Signed-off-by: Douglas Camata <[email protected]>

* Skip OCS objstore test in circle ci

Signed-off-by: Douglas Camata <[email protected]>

* Fix echo in makefile

Signed-off-by: Douglas Camata <[email protected]>

* Fix typo

Signed-off-by: Douglas Camata <[email protected]>

---------

Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
I finally managed to reproduce this failure locally with
efficientgo/e2e@c316eb9.
The added t.Logf() showed that is the problem is that with a lower bytes
limit, it might hit the series or chunks part first. I have bumped the
bytes limit. I calculated the new bytes limit by checking how much bytes
are allocated before sending the last chunk.

I have also noticed that one block is created without a delay. Update it
so that it would be like the others.

Include objstore@main update with
https://github.com/thanos-io/objstore/pull/62/files so that Iter() would
always return an error on a timeout.

Signed-off-by: Giedrius Statkevičius <[email protected]>
…6519)

* pkg/reloader: use watchInterval timeout for initial apply

Signed-off-by: Craig Peterson <[email protected]>

* changelog

Signed-off-by: Craig Peterson <[email protected]>

* use distinct context for initial sync

Signed-off-by: Craig Peterson <[email protected]>

---------

Signed-off-by: Craig Peterson <[email protected]>
alanprot and others added 12 commits July 27, 2023 12:02
* add histogram metrics for index cache item size

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
* Upgrade objstore

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update with the latest APIs

Signed-off-by: Kemal Akkoyun <[email protected]>

---------

Signed-off-by: Kemal Akkoyun <[email protected]>
Co-authored-by: Matej Gera <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
…thanos-io#6531)

* BucketedBytes to buffer byte slices when deconding postings from cache

Signed-off-by: Alan Protasio <[email protected]>

* Fixing Lint

Signed-off-by: Alan Protasio <[email protected]>

---------

Signed-off-by: Alan Protasio <[email protected]>
…-io#6526)

* change shipper upload compacted type from bool to a function

Signed-off-by: Ben Ye <[email protected]>

* add default to false

Signed-off-by: Ben Ye <[email protected]>

* reset uploadedCompacted to 0

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
* Querier: Forward tenant information downstream

With this commit we attach tenant information to each query request and
forward it via the StoreAPI to any downstream Store Gateways and
Queriers.

We add the following command lines options which mimics the tenant
functionality in Receive. The options are currently hidden, as they
provide no real functionality yet. This will come in future steps.

--query.tenant-header
--query.default-tenant
--query.tenant-certificate

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Receive: Use CertificateField from Tenancy pkg

These consts are now defined in the Tenancy package, so we should use
those instead.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

---------

Signed-off-by: Jacob Baungard Hansen <[email protected]>
* deduplicate matchers in posting group

Signed-off-by: Ben Ye <[email protected]>

* don't use map for deduplication

Signed-off-by: Ben Ye <[email protected]>

* lint

Signed-off-by: Ben Ye <[email protected]>

* address comments

Signed-off-by: Ben Ye <[email protected]>

* fix benchmark testcase

Signed-off-by: Ben Ye <[email protected]>

* make sure values sorted

Signed-off-by: Ben Ye <[email protected]>

* fix tests

Signed-off-by: Ben Ye <[email protected]>

* switch to inplace subtract

Signed-off-by: Ben Ye <[email protected]>

* optimize more

Signed-off-by: Ben Ye <[email protected]>

* one more testcase for addAll and non addAll groups merge with empty keys

Signed-off-by: Ben Ye <[email protected]>

* shortcut empty postinggroup early

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
…hanos-io#6556)

* respect block-files-concurrency setting when downsampling

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

* respect block-files-concurrency setting when downsampling

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

* respect block-files-concurrency setting when downsampling

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

---------

Signed-off-by: Vasiliy Rumyantsev <[email protected]>
* remove tenant header logs which are too verbose

Signed-off-by: Ben Ye <[email protected]>

* keep the debug level log

Signed-off-by: Ben Ye <[email protected]>

---------

Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS GiedriusS merged commit 1286f94 into 0.32.0+vinted Jul 27, 2023
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.