Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull moar fixes #63

Merged
merged 42 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
3fdb938
Remove some unused Cortex vendored code and metrics (#6440)
yeya24 Jun 13, 2023
a42c206
Avoid direct cortex vendor dependency in promclient (#6443)
yeya24 Jun 14, 2023
ecc7f68
query.md: Added grpc_{client,server}_handled_total to metrics list. (…
jpds Jun 15, 2023
fccb3c7
store: read postings directly into delta encoded format (#6442)
GiedriusS Jun 15, 2023
db29f98
compact: Add Index Stats to block metadata (#6441)
yeya24 Jun 16, 2023
f416e53
estimate block chunk and series size from metadata (#6449)
yeya24 Jun 16, 2023
6725b2a
Replace group with resolution in compact metrics.
SuperQ Jan 17, 2023
4d17246
Store: fix crash on empty regex matcher
mhoffm-aiven Jun 19, 2023
ce41e16
Cache calculated mint and maxt for each remote engine (#6458)
fpetkovski Jun 20, 2023
4aa1455
cache empty expanded postings (#6464)
yeya24 Jun 22, 2023
1760efe
go.mod: bump rest of otel libs (#6447)
GiedriusS Jun 26, 2023
fe79262
store: disable pooling for postings benchmarks (#6473)
GiedriusS Jun 27, 2023
3b199c8
Check context when expanding postings 2nd attempt (#6471)
yeya24 Jun 27, 2023
f182976
Rename series and postings fetch duration metrics (#6479)
yeya24 Jun 28, 2023
13686c8
Update to prometheus/common v0.44.0 (#6483)
dbason Jun 29, 2023
023f266
go.mod: update rueidis client (#6485)
GiedriusS Jun 29, 2023
fb6d16c
Introduce tenancy package (#6482)
jacobbaungard Jun 30, 2023
a614330
store: optimized snappy streamed reading (#6475)
GiedriusS Jul 3, 2023
4245266
Remove Exists call in meta fetcher (#6474)
fpetkovski Jul 3, 2023
48044c7
e2e/store: use now instead of time.Now() each time (#6493)
GiedriusS Jul 3, 2023
4044328
chore: pkg imported more than once (#6499)
testwill Jul 4, 2023
d3d658c
*: Remove unnecessary configuration reload from `ContentPathReloader`…
douglascamata Jul 5, 2023
089d8ce
add context check when decoding cached postings (#6506)
yeya24 Jul 6, 2023
a48e29b
receive: add float histogram support (#6323)
rabenhorst Jul 10, 2023
2934254
go.mod: bump e2e framework version (#6516)
GiedriusS Jul 10, 2023
e6b4e09
cortex/redisclient: use rueidis client (#6520)
GiedriusS Jul 11, 2023
43ea481
index header: Remove memWriter from fileWriter (#6509)
yeya24 Jul 11, 2023
c2709a4
Wrap object store Prometheus registry (#6152)
douglascamata Jul 11, 2023
295dba9
e2e/store: try to fix Series() limit test again (#6522)
GiedriusS Jul 12, 2023
39662b7
pkg/reloader: use watchInterval timeout for initial apply (#6519)
captncraig Jul 12, 2023
d22ab61
Standardize index cache metrics (#6523)
alanprot Jul 12, 2023
a51ffb8
Add histogram metrics for index cache item size (#6528)
yeya24 Jul 13, 2023
8b16206
Upgrade objstore (#6507)
kakkoyun Jul 14, 2023
b91769f
Make compact lifecycle more flexible to be overridden for sharded com…
alexqyle Jul 14, 2023
9e18965
BucketedBytes to buffer byte slices when decoding postings from cache…
alanprot Jul 17, 2023
ee0b479
Shipper: change upload compacted type from bool to a function (#6526)
yeya24 Jul 19, 2023
9165e77
Query: Forward tenant information via StoreAPI (#6530)
jacobbaungard Jul 20, 2023
e785505
Deduplicate matchers in posting group (#6532)
yeya24 Jul 20, 2023
61d6188
compact: respect block-files-concurrency setting when downsampling (#…
xBazilio Jul 25, 2023
f5af402
Remove tenant header logs in store gateway (#6552)
yeya24 Jul 26, 2023
ac019e7
*: fix build
GiedriusS Jul 27, 2023
abdf6d5
Update .busybox-versions
GiedriusS Jul 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .busybox-versions
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Auto generated by busybox-updater.sh. DO NOT EDIT
amd64=2168263e714f4d36221fc67128f81abae6ad749749900c2ccf6600070dc782ba
arm64=9f7323c6f41b076fb7e539ace8f7c4af60f5837e901376fdedc305f32db76e7b
arm=727c127a153bf992b7dff6a5a9d246887a56ebf9db93a54f355d47dc1d49ce5b
ppc64le=f8e05bd8921c285c9e934800491c4e17e7992834a9d249f275a4a73c5fa13cfe
s390x=b26ae11f6ca2b1327eb2ff238bd18f64276402b94642ee9f147c0d0dac52b615
amd64=c70881248a1c9b593a8222f374d5cfaefd2fbc0d2c10eaf4b2430f680c6571d2
arm64=383c93e022adb5f1e74861634457e4b89accc5b3abf5264be1881d450b84f262
arm=c0185f17a31f497a46489c448aebd8f6d8884cb040d56f16ad3b0f566e5d50da
ppc64le=7e3f7279bda30acfe53f44a4ee5e99878df36d9665e0af33f3a4138289322974
riscv64=13bbeed24b95b733aa95cfb11b96af0941e731fddef15616e551c8c3e8f0a5d4
s390x=8116c7af697a4cc36b40a9f11d0cfda7251508a04a22e691473c3dce0066d08a
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
test:
executor: golang-test
environment:
GO111MODULE: 'on'
GO111MODULE: "on"
steps:
- git-shallow-clone/checkout
- go/mod-download-cached
Expand All @@ -36,7 +36,7 @@ jobs:
- run:
name: "Run unit tests."
environment:
THANOS_TEST_OBJSTORE_SKIP: GCS,S3,AZURE,COS,ALIYUNOSS,BOS,OCI
THANOS_TEST_OBJSTORE_SKIP: GCS,S3,AZURE,COS,ALIYUNOSS,BOS,OCI,OBS
# Variables for Swift testing.
OS_AUTH_URL: http://127.0.0.1:5000/v2.0
OS_PASSWORD: s3cr3t
Expand Down
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,15 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#5777](https://github.com/thanos-io/thanos/pull/5777) Receive: Allow specifying tenant-specific external labels in Router Ingestor.
- [#6352](https://github.com/thanos-io/thanos/pull/6352) Store: Expose store gateway query stats in series response hints.
- [#6420](https://github.com/thanos-io/thanos/pull/6420) Index Cache: Cache expanded postings.
- [#6441](https://github.com/thanos-io/thanos/pull/6441) Compact: Compactor will set `index_stats` in `meta.json` file with max series and chunk size information.
- [#6466](https://github.com/thanos-io/thanos/pull/6466) Mixin (Receive): add limits alerting for configuration reload and meta-monitoring.
- [#6467](https://github.com/thanos-io/thanos/pull/6467) Mixin (Receive): add alert for tenant reaching head series limit.
- [#6528](https://github.com/thanos-io/thanos/pull/6528) Index Cache: Add histogram metric `thanos_store_index_cache_stored_data_size_bytes` for item size.

### Fixed

- [#6496](https://github.com/thanos-io/thanos/pull/6496): *: Remove unnecessary configuration reload from `ContentPathReloader` and improve its tests.
- [#6456](https://github.com/thanos-io/thanos/pull/6456) Store: fix crash when computing set matches from regex pattern
- [#6427](https://github.com/thanos-io/thanos/pull/6427) Receive: increasing log level for failed uploads to error
- [#6172](https://github.com/thanos-io/thanos/pull/6172) query-frontend: return JSON formatted errors for invalid PromQL expression in the split by interval middleware.
- [#6171](https://github.com/thanos-io/thanos/pull/6171) Store: fix error handling on limits.
- [#6183](https://github.com/thanos-io/thanos/pull/6183) Receiver: fix off by one in multitsdb flush that will result in empty blocks if the head only contains one sample
Expand All @@ -40,8 +46,12 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6325](https://github.com/thanos-io/thanos/pull/6325) Store: return gRPC resource exhausted error for byte limiter.
- [#6399](https://github.com/thanos-io/thanos/pull/6399) *: Fix double-counting bug in http_request_duration metric
- [#6428](https://github.com/thanos-io/thanos/pull/6428) Report gRPC connnection errors in the logs.
- [#6519](https://github.com/thanos-io/thanos/pull/6519) Reloader: Use timeout for initial apply.
- [#6509](https://github.com/thanos-io/thanos/pull/6509) Store Gateway: Remove `memWriter` from `fileWriter` to reduce memory usage when sync index headers.
- [#6556](https://github.com/thanos-io/thanos/pull/6556) Thanos compact: respect block-files-concurrency setting when downsampling

### Changed
- [#6049](https://github.com/thanos-io/thanos/pull/6049) Compact: *breaking :warning:* Replace group with resolution in compact metrics to avoid cardinality explosion on compact metrics for large numbers of groups.
- [#6168](https://github.com/thanos-io/thanos/pull/6168) Receiver: Make ketama hashring fail early when configured with number of nodes lower than the replication factor.
- [#6201](https://github.com/thanos-io/thanos/pull/6201) Query-Frontend: Disable absent and absent_over_time for vertical sharding.
- [#6212](https://github.com/thanos-io/thanos/pull/6212) Query-Frontend: Disable scalar for vertical sharding.
Expand All @@ -58,6 +68,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6363](https://github.com/thanos-io/thanos/pull/6363) Store: Check context error when expanding postings.
- [#6405](https://github.com/thanos-io/thanos/pull/6405) Index Cache: Change postings cache key to include the encoding format used so that older Thanos versions would not try to decode it during the deployment of a new version.
- [#6432](https://github.com/thanos-io/thanos/pull/6432) Receive: Remove duplicated `gopkg.in/fsnotify.v1` dependency
- [#6479](https://github.com/thanos-io/thanos/pull/6479) Store: *breaking :warning:* Rename `thanos_bucket_store_cached_series_fetch_duration_seconds` to `thanos_bucket_store_series_fetch_duration_seconds` and `thanos_bucket_store_cached_postings_fetch_duration_seconds` to `thanos_bucket_store_postings_fetch_duration_seconds`.

### Removed

Expand Down Expand Up @@ -96,6 +107,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

- [#6010](https://github.com/thanos-io/thanos/pull/6010) *: Upgrade Prometheus to v0.42.0.
- [#5999](https://github.com/thanos-io/thanos/pull/5999) *: Upgrade Alertmanager dependency to v0.25.0.
- [#6520](https://github.com/thanos-io/thanos/pull/6520): Switch query-frontend to use [Rueidis](https://github.com/redis/rueidis) client. Deleted `idle_timeout`, `max_conn_age`, `pool_size`, `min_idle_conns` fields as they are not used anymore.
- [#5887](https://github.com/thanos-io/thanos/pull/5887) Tracing: Make sure rate limiting sampler is the default, as was the case in version pre-0.29.0.
- [#5997](https://github.com/thanos-io/thanos/pull/5997) Rule: switch to miekgdns DNS resolver as the default one.
- [#6126](https://github.com/thanos-io/thanos/pull/6126) Build with Go 1.20
Expand Down
9 changes: 6 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -307,12 +307,12 @@ test: export THANOS_TEST_PROMETHEUS_PATHS= $(PROMETHEUS)
test: export THANOS_TEST_ALERTMANAGER_PATH= $(ALERTMANAGER)
test: check-git install-tool-deps
@echo ">> install thanos GOOPTS=${GOOPTS}"
@echo ">> running unit tests (without /test/e2e). Do export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI if you want to skip e2e tests against all real store buckets. Current value: ${THANOS_TEST_OBJSTORE_SKIP}"
@echo ">> running unit tests (without /test/e2e). Do export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI,OBS if you want to skip e2e tests against all real store buckets. Current value: ${THANOS_TEST_OBJSTORE_SKIP}"
@go test -timeout 15m $(shell go list ./... | grep -v /vendor/ | grep -v /test/e2e);

.PHONY: test-local
test-local: ## Runs test excluding tests for ALL object storage integrations.
test-local: export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI
test-local: export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI,OBS
test-local:
$(MAKE) test

Expand All @@ -326,11 +326,14 @@ test-e2e: docker $(GOTESPLIT)
@echo ">> running /test/e2e tests."
# NOTE(bwplotka):
# * If you see errors on CI (timeouts), but not locally, try to add -parallel 1 (Wiard note: to the GOTEST_OPTS arg) to limit to single CPU to reproduce small 1CPU machine.
# NOTE(GiedriusS):
# * If you want to limit CPU time available in e2e tests then pass E2E_DOCKER_CPUS environment variable. For example, E2E_DOCKER_CPUS=0.05 limits CPU time available
# to spawned Docker containers to 0.05 cores.
@$(GOTESPLIT) -total ${GH_PARALLEL} -index ${GH_INDEX} ./test/e2e/... -- ${GOTEST_OPTS}

.PHONY: test-e2e-local
test-e2e-local: ## Runs all thanos e2e tests locally.
test-e2e-local: export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI
test-e2e-local: export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI,OBS
test-e2e-local:
$(MAKE) test-e2e

Expand Down
36 changes: 20 additions & 16 deletions cmd/thanos/compact.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@ import (
"github.com/prometheus/common/route"
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/tsdb"

"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/client"
objstoretracing "github.com/thanos-io/objstore/tracing/opentracing"

blocksAPI "github.com/thanos-io/thanos/pkg/api/blocks"
"github.com/thanos-io/thanos/pkg/block"
Expand Down Expand Up @@ -202,10 +205,11 @@ func runCompact(
return err
}

bkt, err := client.NewBucket(logger, confContentYaml, reg, component.String())
bkt, err := client.NewBucket(logger, confContentYaml, component.String())
if err != nil {
return err
}
insBkt := objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))

relabelContentYaml, err := conf.selectorRelabelConf.Content()
if err != nil {
Expand All @@ -220,21 +224,21 @@ func runCompact(
// Ensure we close up everything properly.
defer func() {
if err != nil {
runutil.CloseWithLogOnErr(logger, bkt, "bucket client")
runutil.CloseWithLogOnErr(logger, insBkt, "bucket client")
}
}()

// While fetching blocks, we filter out blocks that were marked for deletion by using IgnoreDeletionMarkFilter.
// The delay of deleteDelay/2 is added to ensure we fetch blocks that are meant to be deleted but do not have a replacement yet.
// This is to make sure compactor will not accidentally perform compactions with gap instead.
ignoreDeletionMarkFilter := block.NewIgnoreDeletionMarkFilter(logger, bkt, deleteDelay/2, conf.blockMetaFetchConcurrency)
ignoreDeletionMarkFilter := block.NewIgnoreDeletionMarkFilter(logger, insBkt, deleteDelay/2, conf.blockMetaFetchConcurrency)
duplicateBlocksFilter := block.NewDeduplicateFilter(conf.blockMetaFetchConcurrency)
noCompactMarkerFilter := compact.NewGatherNoCompactionMarkFilter(logger, bkt, conf.blockMetaFetchConcurrency)
noCompactMarkerFilter := compact.NewGatherNoCompactionMarkFilter(logger, insBkt, conf.blockMetaFetchConcurrency)
labelShardedMetaFilter := block.NewLabelShardedMetaFilter(relabelConfig)
consistencyDelayMetaFilter := block.NewConsistencyDelayMetaFilter(logger, conf.consistencyDelay, extprom.WrapRegistererWithPrefix("thanos_", reg))
timePartitionMetaFilter := block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime)

baseMetaFetcher, err := block.NewBaseFetcher(logger, conf.blockMetaFetchConcurrency, bkt, conf.dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg))
baseMetaFetcher, err := block.NewBaseFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, conf.dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg))
if err != nil {
return errors.Wrap(err, "create meta fetcher")
}
Expand All @@ -252,7 +256,7 @@ func runCompact(
)
}
var (
api = blocksAPI.NewBlocksAPI(logger, conf.webConf.disableCORS, conf.label, flagsMap, bkt)
api = blocksAPI.NewBlocksAPI(logger, conf.webConf.disableCORS, conf.label, flagsMap, insBkt)
sy *compact.Syncer
)
{
Expand All @@ -274,7 +278,7 @@ func runCompact(
sy, err = compact.NewMetaSyncer(
logger,
reg,
bkt,
insBkt,
cf,
duplicateBlocksFilter,
ignoreDeletionMarkFilter,
Expand Down Expand Up @@ -341,7 +345,7 @@ func runCompact(

grouper := compact.NewDefaultGrouper(
logger,
bkt,
insBkt,
conf.acceptMalformedIndex,
enableVerticalCompaction,
reg,
Expand All @@ -355,19 +359,19 @@ func runCompact(
tsdbPlanner := compact.NewPlanner(logger, levels, noCompactMarkerFilter)
planner := compact.WithLargeTotalIndexSizeFilter(
tsdbPlanner,
bkt,
insBkt,
int64(conf.maxBlockIndexSize),
compactMetrics.blocksMarked.WithLabelValues(metadata.NoCompactMarkFilename, metadata.IndexSizeExceedingNoCompactReason),
)
blocksCleaner := compact.NewBlocksCleaner(logger, bkt, ignoreDeletionMarkFilter, deleteDelay, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures)
blocksCleaner := compact.NewBlocksCleaner(logger, insBkt, ignoreDeletionMarkFilter, deleteDelay, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures)
compactor, err := compact.NewBucketCompactor(
logger,
sy,
grouper,
planner,
comp,
compactDir,
bkt,
insBkt,
conf.compactionConcurrency,
conf.skipBlockWithOutOfOrderChunks,
)
Expand Down Expand Up @@ -409,7 +413,7 @@ func runCompact(
return errors.Wrap(err, "syncing metas")
}

compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), bkt, compactMetrics.partialUploadDeleteAttempts, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures)
compact.BestEffortCleanAbortedPartialUploads(ctx, logger, sy.Partial(), insBkt, compactMetrics.partialUploadDeleteAttempts, compactMetrics.blocksCleaned, compactMetrics.blockCleanupFailures)
if err := blocksCleaner.DeleteMarkedBlocks(ctx); err != nil {
return errors.Wrap(err, "cleaning marked blocks")
}
Expand Down Expand Up @@ -437,15 +441,15 @@ func runCompact(
downsampleMetrics.downsamples.WithLabelValues(groupKey)
downsampleMetrics.downsampleFailures.WithLabelValues(groupKey)
}
if err := downsampleBucket(ctx, logger, downsampleMetrics, bkt, sy.Metas(), downsamplingDir, conf.downsampleConcurrency, metadata.HashFunc(conf.hashFunc), conf.acceptMalformedIndex); err != nil {
if err := downsampleBucket(ctx, logger, downsampleMetrics, insBkt, sy.Metas(), downsamplingDir, conf.downsampleConcurrency, conf.blockFilesConcurrency, metadata.HashFunc(conf.hashFunc), conf.acceptMalformedIndex); err != nil {
return errors.Wrap(err, "first pass of downsampling failed")
}

level.Info(logger).Log("msg", "start second pass of downsampling")
if err := sy.SyncMetas(ctx); err != nil {
return errors.Wrap(err, "sync before second pass of downsampling")
}
if err := downsampleBucket(ctx, logger, downsampleMetrics, bkt, sy.Metas(), downsamplingDir, conf.downsampleConcurrency, metadata.HashFunc(conf.hashFunc), conf.acceptMalformedIndex); err != nil {
if err := downsampleBucket(ctx, logger, downsampleMetrics, insBkt, sy.Metas(), downsamplingDir, conf.downsampleConcurrency, conf.blockFilesConcurrency, metadata.HashFunc(conf.hashFunc), conf.acceptMalformedIndex); err != nil {
return errors.Wrap(err, "second pass of downsampling failed")
}
level.Info(logger).Log("msg", "downsampling iterations done")
Expand All @@ -458,15 +462,15 @@ func runCompact(
return errors.Wrap(err, "sync before retention")
}

if err := compact.ApplyRetentionPolicyByResolution(ctx, logger, bkt, sy.Metas(), retentionByResolution, compactMetrics.blocksMarked.WithLabelValues(metadata.DeletionMarkFilename, "")); err != nil {
if err := compact.ApplyRetentionPolicyByResolution(ctx, logger, insBkt, sy.Metas(), retentionByResolution, compactMetrics.blocksMarked.WithLabelValues(metadata.DeletionMarkFilename, "")); err != nil {
return errors.Wrap(err, "retention failed")
}

return cleanPartialMarked()
}

g.Add(func() error {
defer runutil.CloseWithLogOnErr(logger, bkt, "bucket client")
defer runutil.CloseWithLogOnErr(logger, insBkt, "bucket client")

if !conf.wait {
return compactMainFn()
Expand Down
Loading
Loading