Parallelize vertical compactions inside a single group #5936

fpetkovski · 2022-12-01T16:34:22Z

The current compaction implementation allocates one goroutine per compaction stream. This means that compaction can run as fast as the slowest stream. As a result, as soon as one stream starts to fall behind, the all other streams can become affected. In addition, despite setting a high compact-concurrency, CPU utilization can still be low because of the one-goroutine-per-stream limit.

The compaction algorithm also prioritizes vertical compactions over horizontal ones. As soon as it detects any overlapping blocks, it will compact those blocks and reevaluate the plan in a subsequent iteration.

This commit enables parallel execution of vertical compactions within a single compaction stream. It does that by first changing the Planner interface to allow it to return multiple compaction tasks per group instead of a single one. It also adapts the algorithm for detecting overlapping blocks to be able to detect multiple independent groups. These groups are then returned as distinct compaction tasks and the compactor can execute them in separate goroutines.

By modifying the planner interface, this commit also enables parallelizing horizontal compactions in the future.

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Implement parallel vertical compactions for a single compaction group.

Verification

This is only tested through unit tests. I plan to run this in a staging environment soon.

The current compaction implementation allocates one goroutine per compaction stream. This means that compaction can run as fast as the slowest stream. As a result, as soon as one stream starts to fall behind, the all other streams can become affected. In addition, despite setting a high compact-concurrency, CPU utilization can still be low because of the one-goroutine-per-stream limit. The compaction algorithm also prioritizes vertical compactions over horizontal ones. As soon as it detects any overlapping blocks, it will compact those blocks and reevaluate the plan in a subsequent iteration. This commit enables parallel execution of vertical compactions within a single compaction stream. It does that by first changing the Planner interface to allow it to return multiple compaction tasks per group instead of a single one. It also adapts the algorithm for detecting overlapping blocks to be able to detect multiple independent groups. These groups are then returned as distinct compaction tasks and the compactor can execute them in separate goroutines. By modifying the planner interface, this commit also enables parallelizing horizontal compactions in the future. Signed-off-by: Filip Petkovski <[email protected]>

yeya24 · 2022-12-02T06:37:19Z

pkg/compact/planner.go

 	return p.plan(p.noCompBlocksFunc(), metasByMinTime)
 }

-func (p *tsdbBasedPlanner) plan(noCompactMarked map[ulid.ULID]*metadata.NoCompactMark, metasByMinTime []*metadata.Meta) ([]*metadata.Meta, error) {
+type compactionTask []*metadata.Meta


We need to expose this so that downstream project like Cortex can access it.

Makes sense. Should be public now.

yeya24 · 2022-12-02T06:40:15Z

pkg/compact/compact.go

@@ -727,7 +729,7 @@ func (rs *RetentionProgressCalculator) ProgressCalculate(ctx context.Context, gr
 type Planner interface {
 	// Plan returns a list of blocks that should be compacted into single one.
 	// The blocks can be overlapping. The provided metadata has to be ordered by minTime.
-	Plan(ctx context.Context, metasByMinTime []*metadata.Meta) ([]*metadata.Meta, error)
+	Plan(ctx context.Context, metasByMinTime []*metadata.Meta) ([]compactionTask, error)


I am trying to understand the interface. The return value of compactionTasks means those are tasks that we can run in parallel safely?

Yes, that would be the idea. I've added this a documentation on the type.

fpetkovski · 2022-12-02T06:58:15Z

pkg/compact/compact.go

@@ -768,19 +771,19 @@ func (cg *Group) Compact(ctx context.Context, dir string, planner Planner, comp
 	}()

 	if err := os.MkdirAll(subDir, 0750); err != nil {
-		return false, ulid.ULID{}, errors.Wrap(err, "create compaction group dir")


The ULID return value was not used anywhere which is why I've removed it here.

This is to fit into Prometheus interface. Especially in Prometheus, empty ULID meant special behaviour when compacting blocks - we rely on it to check if compaction really compacted anything.

I think it's fine to change interface if you want - it's not super important to allow Prometheus to use Thanos compaction on anything like that. Just be careful with Cortex @yeya24 and Mimir @pracucci who might use this interface/our structs here.

I checked on the Cortex side and we are not using it, too. I guess same for Mimir as I heard they are moving away from importing Thanos main repo directly.
I don't have a strong preference here. OK to clean it up or keep it.

I guess same for Mimir as I heard they are moving away from importing Thanos main repo directly.

Thanks for asking! I confirm we're not running Thanos compactor or store-gateway anymore in Mimir.

fpetkovski · 2022-12-02T06:59:59Z

pkg/compact/compact.go

+		groupErr   multierror.MultiError
+		rerunGroup bool
+	)
+	for _, task := range tasks {


I am not sure how best to handle this, since we will have unbound concurrency. We can have a per-group concurrency in the short term, but that can still lead to one group slowing down everything else. Long term maybe we want a single queue for tasks so that we can have global concurrency for all tasks.

I think we might need that sooner than later - It's common for users to crash their compactor for even TWO compactions at the same time. Why not workers approach as usually? (e.g at max 5 compactions at one time)

Also we have compactions on bigger laver (caller of this method I believe)- can we have one concurrency loop to not get too complex in terms of unpredictive concurrency?

Do you mean max 5 parallel compactions inside a single group? Or 5 parallel vertical compactions across all groups? The former is easier to implement, the latter is better but will make this PR bigger :)

We should have concurrency on a task level now, so users can also run one task at a time if they want to.

pkg/compact/compact.go

Signed-off-by: Filip Petkovski <[email protected]>

bwplotka

Looks good, just some suggestions. Thanks!

bwplotka · 2022-12-02T14:28:23Z

pkg/compact/compact.go

+		groupErr   multierror.MultiError
+		rerunGroup bool
+	)
+	for _, task := range tasks {


I think we might need that sooner than later - It's common for users to crash their compactor for even TWO compactions at the same time. Why not workers approach as usually? (e.g at max 5 compactions at one time)

bwplotka · 2022-12-02T14:31:02Z

pkg/compact/compact.go

+		groupErr   multierror.MultiError
+		rerunGroup bool
+	)
+	for _, task := range tasks {


Also we have compactions on bigger laver (caller of this method I believe)- can we have one concurrency loop to not get too complex in terms of unpredictive concurrency?

bwplotka · 2022-12-02T14:31:10Z

pkg/compact/compact.go

@@ -768,19 +771,19 @@ func (cg *Group) Compact(ctx context.Context, dir string, planner Planner, comp
 	}()

 	if err := os.MkdirAll(subDir, 0750); err != nil {
-		return false, ulid.ULID{}, errors.Wrap(err, "create compaction group dir")


This is to fit into Prometheus interface. Especially in Prometheus, empty ULID meant special behaviour when compacting blocks - we rely on it to check if compaction really compacted anything.

I think it's fine to change interface if you want - it's not super important to allow Prometheus to use Thanos compaction on anything like that. Just be careful with Cortex @yeya24 and Mimir @pracucci who might use this interface/our structs here.

bwplotka · 2022-12-02T14:33:51Z

pkg/compact/planner.go

+	// overlaps when the next block is contained in the current block.
+	// See test case https://github.com/thanos-io/thanos/blob/04106d7a7add7f47025c00422c80f746650c1b97/pkg/compact/planner_test.go#L310-L321.
+loopMetas:
+	for i := range metasByMinTime {


Critical code: Reviewers please be careful here @thanos-io/thanos-maintainers - the bug here can cause irreversible data malformations

bwplotka · 2022-12-02T14:36:18Z

pkg/compact/planner_test.go

 		{
-			name: "Overlapping blocks 5",
+			name: "Multiple independent groups of overlapping blocks",


I don't think there are independent groups, are they?

Signed-off-by: Filip Petkovski <[email protected]>

pkg/compact/compact.go

Signed-off-by: Filip Petkovski <[email protected]>

bwplotka · 2022-12-07T10:39:14Z

cmd/thanos/compact.go

@@ -746,6 +747,10 @@ func (cc *compactConfig) registerFlag(cmd extkingpin.FlagClause) {
 		"NOTE: This flag is ignored and (enabled) when --deduplication.replica-label flag is set.").
 		Hidden().Default("false").BoolVar(&cc.enableVerticalCompaction)

+	cmd.Flag("compact.group-concurrency", "The number of concurrent compactions from a single compaction group inside one compaction iteration. "+


Can we add this flag next to compact.concurrency for readability?

Are we sure we need this granuality of config? Is there a way to automatically detect this from compact.concurrency?

Love the idea of detecting based on compact.concurrency.

Signed-off-by: Filip Petkovski <[email protected]>

pkg/compact/compact.go

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski · 2023-01-13T16:31:43Z

Any interest in moving forward with this?

GiedriusS · 2023-01-14T07:01:40Z

Let me try to review this next week

GiedriusS

LGTM from the first glance, need to double-check after the weekend 😄

stale · 2023-05-21T19:39:53Z

Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2023-06-18T09:30:43Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

pull-request-size bot added the size/L label Dec 1, 2022

fpetkovski requested review from bwplotka, GiedriusS and yeya24 December 1, 2022 16:34

fpetkovski changed the title ~~Parallelize vertical compactions~~ Parallelize vertical compactions for a single group Dec 1, 2022

fpetkovski force-pushed the parallel-vcompact branch from 5f88f86 to 912172b Compare December 1, 2022 16:36

yeya24 reviewed Dec 2, 2022

View reviewed changes

fpetkovski commented Dec 2, 2022

View reviewed changes

fpetkovski changed the title ~~Parallelize vertical compactions for a single group~~ Parallelize vertical compactions inside a single group Dec 2, 2022

fpetkovski force-pushed the parallel-vcompact branch from 49906ef to d6fbe29 Compare December 2, 2022 07:02

yeya24 reviewed Dec 2, 2022

View reviewed changes

pkg/compact/compact.go Outdated Show resolved Hide resolved

Run all tasks independently

d71c4d1

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski force-pushed the parallel-vcompact branch from d6fbe29 to d71c4d1 Compare December 2, 2022 07:03

fpetkovski added 4 commits December 2, 2022 08:05

Use errutil.MultiError instead of cortex MultiError

c911229

Signed-off-by: Filip Petkovski <[email protected]>

Add test case for progress calculator

afe27df

Signed-off-by: Filip Petkovski <[email protected]>

Remove ULID return value

321efdc

Signed-off-by: Filip Petkovski <[email protected]>

Iterate through all errors for a compaction group

31885e1

Signed-off-by: Filip Petkovski <[email protected]>

pull-request-size bot added size/XL and removed size/L labels Dec 2, 2022

Add test case for large index size

86af2a8

Signed-off-by: Filip Petkovski <[email protected]>

bwplotka previously approved these changes Dec 2, 2022

View reviewed changes

fpetkovski dismissed bwplotka’s stale review via 812fc1d December 3, 2022 07:11

Parametrize concurrency inside single group

1248a96

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski force-pushed the parallel-vcompact branch from 812fc1d to 1248a96 Compare December 3, 2022 07:32

fpetkovski added 2 commits December 3, 2022 08:33

Change test name

09bdb1a

Signed-off-by: Filip Petkovski <[email protected]>

Limit concurrency of tasks to global concurrency

b58d2bf

Signed-off-by: Filip Petkovski <[email protected]>

sonatype-lift bot reviewed Dec 4, 2022

View reviewed changes

pkg/compact/compact.go Outdated Show resolved Hide resolved

Improve flag description

c804045

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski added 3 commits December 4, 2022 08:42

Fix linter

d754370

Signed-off-by: Filip Petkovski <[email protected]>

Add vertical compaction check

24fd336

Signed-off-by: Filip Petkovski <[email protected]>

Adjust e2e tests to not account for empty planning cycles

a808840

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski force-pushed the parallel-vcompact branch from 845dfc5 to a808840 Compare December 4, 2022 15:21

Run make docs

7c6dba5

Signed-off-by: Filip Petkovski <[email protected]>

bwplotka reviewed Dec 7, 2022

View reviewed changes

fpetkovski added 5 commits December 7, 2022 12:08

Automatically infer group concurrency

a7d4a7b

Signed-off-by: Filip Petkovski <[email protected]>

Run make docs

f476edb

Signed-off-by: Filip Petkovski <[email protected]>

Fix tests

2662961

Signed-off-by: Filip Petkovski <[email protected]>

Distribute tasks in round-robin manner

407e98a

Signed-off-by: Filip Petkovski <[email protected]>

Add e2e test for parallel compaction

25705dc

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski force-pushed the parallel-vcompact branch from 69f8cf5 to ba817fb Compare December 7, 2022 12:23

Plan at least one task from each group

e3e94c6

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski force-pushed the parallel-vcompact branch from ba817fb to e3e94c6 Compare December 7, 2022 12:24

yeya24 reviewed Dec 11, 2022

View reviewed changes

pkg/compact/compact.go Outdated Show resolved Hide resolved

pkg/compact/compact.go Outdated Show resolved Hide resolved

Clean up scheduling

ea766ad

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski requested review from yeya24 and bwplotka December 15, 2022 08:19

GiedriusS reviewed Jan 20, 2023

View reviewed changes

stale bot added the stale label May 21, 2023

stale bot closed this Jun 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize vertical compactions inside a single group #5936

Parallelize vertical compactions inside a single group #5936

fpetkovski commented Dec 1, 2022

yeya24 Dec 2, 2022 •

edited

Loading

fpetkovski Dec 2, 2022

yeya24 Dec 2, 2022

fpetkovski Dec 2, 2022

fpetkovski Dec 2, 2022

bwplotka Dec 2, 2022

yeya24 Dec 11, 2022

pracucci Dec 12, 2022

fpetkovski Dec 2, 2022

bwplotka Dec 2, 2022

bwplotka Dec 2, 2022

fpetkovski Dec 2, 2022

fpetkovski Dec 7, 2022

bwplotka left a comment

bwplotka Dec 2, 2022

bwplotka Dec 2, 2022

bwplotka Dec 2, 2022

bwplotka Dec 2, 2022

bwplotka Dec 2, 2022

bwplotka Dec 7, 2022

fpetkovski Dec 7, 2022

fpetkovski commented Jan 13, 2023

GiedriusS commented Jan 14, 2023

GiedriusS left a comment

stale bot commented May 21, 2023

stale bot commented Jun 18, 2023

Parallelize vertical compactions inside a single group #5936

Parallelize vertical compactions inside a single group #5936

Conversation

fpetkovski commented Dec 1, 2022

Changes

Verification

yeya24 Dec 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski commented Jan 13, 2023

GiedriusS commented Jan 14, 2023

GiedriusS left a comment

Choose a reason for hiding this comment

stale bot commented May 21, 2023

stale bot commented Jun 18, 2023

yeya24 Dec 2, 2022 •

edited

Loading