TraceQL Perf: Reduce metadata retrieved by implementing two pass iteration. #2119

joe-elliott · 2023-02-21T17:02:05Z

What this PR does:
Dramatically reduces the amount of data pulled from the backend for "needle in the haystack" queries by implementing a two pass iteration over the data. The first pass retrieves all of the fields necessary to evaluate the query and then metadata is only retrieved for those spans that actually fulfill the query. Previously metadata was pulled along with the span data which resulted in slower queries when there were few matches.

Fixes #2138

e.g. for the query { resource.service.name = "foo" }

Previously iterators looked like the following. In this case span metadata is always retrieved.

JoinIterator: 0	
	LeftJoinIterator: 1
		required: 
			ColumnIterator: rs.Resource.ServiceName 
			InstrumentedPredicate{0, StringInPredicate{foo, }}
			LeftJoinIterator: 3
			required: 
				ColumnIterator: rs.ils.Spans.StartUnixNanos 
				InstrumentedPredicate{0, nil}
				ColumnIterator: rs.ils.Spans.EndUnixNanos 
				InstrumentedPredicate{0, nil}
				ColumnIterator: rs.ils.Spans.ID 
				InstrumentedPredicate{0, nil}
			optional: 
		optional: 
	ColumnIterator: TraceID 
		InstrumentedPredicate{0, nil}
	ColumnIterator: StartTimeUnixNano 
		InstrumentedPredicate{0, nil}
	ColumnIterator: DurationNanos 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootSpanName 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootServiceName 
		InstrumentedPredicate{0, nil}
traceCollector{})

Now they look like the following. In this case only the data necessary to evaluate the query is pulled, a "filter" callback is called and only those spans that survive the callback pass through the trace and span metadata iterators.

JoinIterator: 0	
	JoinIterator: 3	
		spansToMetaIterator: 
			JoinIterator: 0	
			LeftJoinIterator: 1
				required: 
					ColumnIterator: rs.Resource.ServiceName 
					InstrumentedPredicate{0, StringInPredicate{bar, }}
					LeftJoinIterator: 3
					required: 
						ColumnIterator: rs.ils.Spans.Kind 
						InstrumentedPredicate{0, nil}
					optional: 
					spanCollector(0, [])
				optional: 
				batchCollector{true, 1}
			ColumnIterator: StartTimeUnixNano 
				InstrumentedPredicate{0, IntBetweenPredicate{0,1001000000000}}
			ColumnIterator: EndTimeUnixNano 
				InstrumentedPredicate{0, IntBetweenPredicate{1000000000000,9223372036854775807}}
		traceCollector{})
		ColumnIterator: rs.ils.Spans.StartUnixNanos 
			InstrumentedPredicate{0, nil}
		ColumnIterator: rs.ils.Spans.EndUnixNanos 
			InstrumentedPredicate{0, nil}
		ColumnIterator: rs.ils.Spans.ID 
			InstrumentedPredicate{0, nil}
	spanMetaCollector())
	ColumnIterator: TraceID 
		InstrumentedPredicate{0, nil}
	ColumnIterator: StartTimeUnixNano 
		InstrumentedPredicate{0, nil}
	ColumnIterator: DurationNanos 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootSpanName 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootServiceName 
		InstrumentedPredicate{0, nil}
traceMetaCollector{})

Other changes:

Adds Stringer to the iterator and predicate interfaces that can be used to generate the dumps above.
Adjusted the range of the search query to use the trace time ranges and not the span time ranges to reduce pulled data.
Added some memory pooling and other shared memory structures.

Future Work:

With this change the span level iterator can likely be reworked to be even more efficient but in the interest of keeping this PR from growing forever this can be done later.

Benchmarks:
The BenchmarkBackendBlockTraceQL was extended to have significantly more test cases. The latest memory improvements caused some of our absolute fastest queries to slow down a bit but it saved huge allocations on the worst queries. I struggle to completely trust the CPU time benchmarks. This bench suite showed a lot of variance on CPU time likely due to thermal throttling on my tiny laptop. Someone should attempt to recreate.

Also note these benchmarks do not take full advantage of the change as they don't run the engine filter.

name                                               old time/op    new time/op    delta
BackendBlockTraceQL/spanAttNameNoMatch-8             2.11ms ± 3%    2.21ms ±12%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8              87.4ms ± 6%     8.0ms ±29%  -90.84%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                 137ms ±32%      68ms ±20%  -50.56%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8        1.97ms ±16%    2.05ms ±13%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8          53.0ms ±41%    67.3ms ±10%     ~     (p=0.151 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8          463µs ±12%     481µs ±10%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8          43.6ms ±12%    15.8ms ± 7%  -63.80%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8            51.5ms ±16%    82.9ms ±30%  +60.92%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8     359µs ±15%     415µs ±38%     ~     (p=0.548 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8      51.5ms ±23%    62.9ms ±40%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8                1.57s ±38%     0.82s ±23%  -47.59%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                 1.22s ±42%     0.53s ±11%  -56.34%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8           588µs ±20%     487µs ±16%  -17.16%  (p=0.032 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8            1.10s ±22%     0.53s ± 5%  -51.51%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8               361µs ±17%     496µs ±38%  +37.25%  (p=0.032 n=5+5)

name                                               old speed      new speed      delta
BackendBlockTraceQL/spanAttNameNoMatch-8            592MB/s ± 3%   567MB/s ±11%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8             253MB/s ± 5%   384MB/s ±24%  +51.91%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8               155MB/s ±26%    26MB/s ±23%  -83.07%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8       625MB/s ±14%   599MB/s ±14%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8         356MB/s ±31%   272MB/s ±10%     ~     (p=0.151 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8       1.10GB/s ±11%  1.06GB/s ± 9%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8         372MB/s ±11%   158MB/s ± 6%  -57.42%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8           321MB/s ±15%   211MB/s ±25%  -34.16%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8   167MB/s ±13%   149MB/s ±30%     ~     (p=0.548 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8     322MB/s ±25%   279MB/s ±32%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8             16.6MB/s ±32%   7.8MB/s ±20%  -52.89%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8              21.1MB/s ±33%  12.0MB/s ±10%  -43.14%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8         871MB/s ±18%  1048MB/s ±15%  +20.29%  (p=0.032 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8         21.2MB/s ±19%  40.4MB/s ± 5%  +90.90%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8             166MB/s ±15%   126MB/s ±33%  -24.16%  (p=0.032 n=5+5)

name                                               old MB_io/op   new MB_io/op   delta
BackendBlockTraceQL/spanAttNameNoMatch-8               1.25 ± 0%      1.25 ± 0%     ~     (all equal)
BackendBlockTraceQL/spanAttValNoMatch-8                22.1 ± 0%       3.0 ± 0%  -86.38%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                  20.6 ± 0%       1.7 ± 0%  -91.55%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8          1.22 ± 0%      1.22 ± 0%     ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicMatch-8            18.3 ± 0%      18.3 ± 0%     ~     (all equal)
BackendBlockTraceQL/resourceAttNameNoMatch-8           0.51 ± 0%      0.51 ± 0%     ~     (all equal)
BackendBlockTraceQL/resourceAttValNoMatch-8            16.1 ± 0%       2.5 ± 0%  -84.54%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8              16.4 ± 0%      17.1 ± 0%   +4.02%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8      0.06 ± 0%      0.06 ± 0%     ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8        16.1 ± 0%      16.8 ± 0%   +4.17%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8                 24.6 ± 0%       6.4 ± 0%  -74.10%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                  24.6 ± 0%       6.4 ± 0%  -74.10%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8            0.51 ± 0%      0.51 ± 0%     ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchOr-8             22.9 ± 0%      21.4 ± 0%   -6.21%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8                0.06 ± 0%      0.06 ± 0%     ~     (all equal)

name                                               old alloc/op   new alloc/op   delta
BackendBlockTraceQL/spanAttNameNoMatch-8             67.3kB ± 9%    66.3kB ±14%     ~     (p=0.548 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8              10.1MB ± 1%     2.5MB ± 1%  -74.95%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                11.2MB ± 4%     1.9MB ±10%  -82.69%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8        97.6kB ± 3%   100.1kB ± 3%     ~     (p=0.095 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8          7.18MB ± 2%    7.21MB ± 2%     ~     (p=0.222 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8         42.4kB ± 1%    43.2kB ± 1%   +2.01%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8          9.17MB ± 1%    4.82MB ± 1%  -47.37%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8            17.0MB ± 2%    18.8MB ± 2%  +10.47%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8    36.7kB ± 1%    37.7kB ± 0%   +2.65%  (p=0.016 n=5+4)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8      7.78MB ± 2%    9.55MB ± 2%  +22.87%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8                291MB ± 1%       7MB ±11%  -97.70%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                 290MB ± 1%       7MB ±12%  -97.65%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8          43.3kB ± 2%    43.5kB ± 1%     ~     (p=0.310 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8            291MB ± 1%      13MB ± 8%  -95.56%  (p=0.016 n=4+5)
BackendBlockTraceQL/mixedValBothMatch-8              37.7kB ± 1%    38.2kB ± 3%     ~     (p=0.095 n=5+5)

name                                               old allocs/op  new allocs/op  delta
BackendBlockTraceQL/spanAttNameNoMatch-8                362 ± 0%       370 ± 0%   +2.21%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8                344k ± 0%        1k ± 0%  -99.84%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                  343k ± 0%        1k ± 0%  -99.69%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8           349 ± 0%       357 ± 0%   +2.29%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8           92.4k ± 0%     92.4k ± 0%   +0.04%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8            271 ± 0%       286 ± 0%   +5.54%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8           82.6k ± 0%      0.5k ± 0%  -99.36%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8              115k ± 0%      115k ± 0%   +0.26%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8       162 ± 0%       177 ± 0%   +9.26%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8        144k ± 0%      145k ± 0%   +0.55%  (p=0.016 n=4+5)
BackendBlockTraceQL/mixedNameNoMatch-8                4.25M ± 0%     0.00M ± 1%  -99.96%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                 4.25M ± 0%     0.00M ± 2%  -99.93%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8             285 ± 0%       293 ± 0%   +2.81%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8            4.28M ± 0%     0.09M ± 0%  -97.82%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8                 176 ± 0%       184 ± 0%   +4.55%  (p=0.008 n=5+5)

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>

pkg/traceql/ast_execute_test.go

tempodb/encoding/vparquet/block_traceql.go

mdisibio · 2023-02-27T17:06:14Z

tempodb/encoding/vparquet/block_traceql.go

+		for _, ss := range filteredSpansets {
+			for _, s := range ss.Spans {
+				span := s
+				i.currentSpans = append(i.currentSpans, span)


I don't think I understand why this is necessary. If the engine contains a by() then i.filter(*spanset) may split it up into multiple spansets and that should be preserved and returned, correct?

yes, this is due to the way the metadata portion of the iterators works. this is ok as long as their is only one spanset per trace, but as discussed this will break once we add by().

should we resolve this now? or solve it when we write by() and coalesce()?

Leaning towards solve when adding by() and coalesce() so we can benefit from the performance improvements now.

tempodb/encoding/vparquet/block_traceql.go

tempodb/encoding/vparquet/block_traceql_meta.go

Signed-off-by: Joe Elliott <[email protected]>

mdisibio

Solid work and amazing performance 🚀🚀 99% LGTM, just a few small questions.

pkg/traceql/engine.go

tempodb/encoding/vparquet/block_traceql.go

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott added 10 commits February 21, 2023 08:32

sketching out ideas

6953bad

Signed-off-by: Joe Elliott <[email protected]>

wip: traceql engine two pass

e871134

Signed-off-by: Joe Elliott <[email protected]>

inteface cleanup

81b68bc

Signed-off-by: Joe Elliott <[email protected]>

technically compiles

a7afed4

Signed-off-by: Joe Elliott <[email protected]>

added a way to dump the iterator tree

12b2cdc

Signed-off-by: Joe Elliott <[email protected]>

fixin' bugs

1bdd95a

Signed-off-by: Joe Elliott <[email protected]>

add meta support

b721648

Signed-off-by: Joe Elliott <[email protected]>

test finagling

5d13894

Signed-off-by: Joe Elliott <[email protected]>

extend benchmarks

998b376

Signed-off-by: Joe Elliott <[email protected]>

test cleanup

a603824

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott requested review from annanay25, mdisibio, mapno, kvrhdn and zalegrala as code owners February 21, 2023 17:02

joe-elliott added 10 commits February 21, 2023 12:17

removed query field

100ce11

Signed-off-by: Joe Elliott <[email protected]>

lint

04b4ba0

Signed-off-by: Joe Elliott <[email protected]>

fix test

bc9e247

Signed-off-by: Joe Elliott <[email protected]>

fixed/improved bench

632cdce

Signed-off-by: Joe Elliott <[email protected]>

span -> *span

ca8a8af

Signed-off-by: Joe Elliott <[email protected]>

pool spans between span and batch collectors

09ac676

Signed-off-by: Joe Elliott <[email protected]>

use shared span slice and lazily creaetd spanset

26cd365

Signed-off-by: Joe Elliott <[email protected]>

fix

0ab4eea

Signed-off-by: Joe Elliott <[email protected]>

more putSpans

7da5b88

Signed-off-by: Joe Elliott <[email protected]>

overwrite atts

c9bac95

Signed-off-by: Joe Elliott <[email protected]>

mdisibio reviewed Feb 27, 2023

View reviewed changes

joe-elliott added 4 commits February 27, 2023 15:23

Change traceql.Span to be an interface

beb06df

Signed-off-by: Joe Elliott <[email protected]>

readded span ids

ebe27ee

Signed-off-by: Joe Elliott <[email protected]>

remove kindAsCount

8c0d60b

Signed-off-by: Joe Elliott <[email protected]>

fix optimization

218ba99

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott added 3 commits February 28, 2023 08:32

tier out close

00b2fd2

Signed-off-by: Joe Elliott <[email protected]>

Merge branch 'main' into two-pass-is-best-pass

16f352f

patched up tests

110f716

Signed-off-by: Joe Elliott <[email protected]>

mdisibio mentioned this pull request Mar 3, 2023

Synchronous iterator #2165

Merged

3 tasks

mdisibio reviewed Mar 3, 2023

View reviewed changes

pkg/traceql/engine.go Outdated Show resolved Hide resolved

tempodb/encoding/vparquet/block_traceql.go Outdated Show resolved Hide resolved

tempodb/encoding/vparquet/block_traceql.go Outdated Show resolved Hide resolved

joe-elliott added 2 commits March 3, 2023 12:39

Merge branch 'main' into two-pass-is-best-pass

5e91bab

review

0388a62

Signed-off-by: Joe Elliott <[email protected]>

mdisibio approved these changes Mar 6, 2023

View reviewed changes

joe-elliott merged commit 6d8df84 into grafana:main Mar 6, 2023

mdisibio mentioned this pull request Apr 3, 2023

TraceQL: Query Sharding configuration #2252

Open

joe-elliott mentioned this pull request Apr 28, 2023

[DOC] Release notes for 2.1.1 #2406

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TraceQL Perf: Reduce metadata retrieved by implementing two pass iteration. #2119

TraceQL Perf: Reduce metadata retrieved by implementing two pass iteration. #2119

joe-elliott commented Feb 21, 2023 •

edited

Loading

mdisibio Feb 27, 2023

joe-elliott Feb 27, 2023 •

edited

Loading

mdisibio Mar 3, 2023

mdisibio left a comment

TraceQL Perf: Reduce metadata retrieved by implementing two pass iteration. #2119

TraceQL Perf: Reduce metadata retrieved by implementing two pass iteration. #2119

Conversation

joe-elliott commented Feb 21, 2023 • edited Loading

mdisibio Feb 27, 2023

Choose a reason for hiding this comment

joe-elliott Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

mdisibio Mar 3, 2023

Choose a reason for hiding this comment

mdisibio left a comment

Choose a reason for hiding this comment

joe-elliott commented Feb 21, 2023 •

edited

Loading

joe-elliott Feb 27, 2023 •

edited

Loading