Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceQL Perf: Reduce metadata retrieved by implementing two pass iteration. #2119

Merged
merged 29 commits into from
Mar 6, 2023

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Feb 21, 2023

What this PR does:
Dramatically reduces the amount of data pulled from the backend for "needle in the haystack" queries by implementing a two pass iteration over the data. The first pass retrieves all of the fields necessary to evaluate the query and then metadata is only retrieved for those spans that actually fulfill the query. Previously metadata was pulled along with the span data which resulted in slower queries when there were few matches.

Fixes #2138

e.g. for the query { resource.service.name = "foo" }

Previously iterators looked like the following. In this case span metadata is always retrieved.

JoinIterator: 0	
	LeftJoinIterator: 1
		required: 
			ColumnIterator: rs.Resource.ServiceName 
			InstrumentedPredicate{0, StringInPredicate{foo, }}
			LeftJoinIterator: 3
			required: 
				ColumnIterator: rs.ils.Spans.StartUnixNanos 
				InstrumentedPredicate{0, nil}
				ColumnIterator: rs.ils.Spans.EndUnixNanos 
				InstrumentedPredicate{0, nil}
				ColumnIterator: rs.ils.Spans.ID 
				InstrumentedPredicate{0, nil}
			optional: 
		optional: 
	ColumnIterator: TraceID 
		InstrumentedPredicate{0, nil}
	ColumnIterator: StartTimeUnixNano 
		InstrumentedPredicate{0, nil}
	ColumnIterator: DurationNanos 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootSpanName 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootServiceName 
		InstrumentedPredicate{0, nil}
traceCollector{})

Now they look like the following. In this case only the data necessary to evaluate the query is pulled, a "filter" callback is called and only those spans that survive the callback pass through the trace and span metadata iterators.

JoinIterator: 0	
	JoinIterator: 3	
		spansToMetaIterator: 
			JoinIterator: 0	
			LeftJoinIterator: 1
				required: 
					ColumnIterator: rs.Resource.ServiceName 
					InstrumentedPredicate{0, StringInPredicate{bar, }}
					LeftJoinIterator: 3
					required: 
						ColumnIterator: rs.ils.Spans.Kind 
						InstrumentedPredicate{0, nil}
					optional: 
					spanCollector(0, [])
				optional: 
				batchCollector{true, 1}
			ColumnIterator: StartTimeUnixNano 
				InstrumentedPredicate{0, IntBetweenPredicate{0,1001000000000}}
			ColumnIterator: EndTimeUnixNano 
				InstrumentedPredicate{0, IntBetweenPredicate{1000000000000,9223372036854775807}}
		traceCollector{})
		ColumnIterator: rs.ils.Spans.StartUnixNanos 
			InstrumentedPredicate{0, nil}
		ColumnIterator: rs.ils.Spans.EndUnixNanos 
			InstrumentedPredicate{0, nil}
		ColumnIterator: rs.ils.Spans.ID 
			InstrumentedPredicate{0, nil}
	spanMetaCollector())
	ColumnIterator: TraceID 
		InstrumentedPredicate{0, nil}
	ColumnIterator: StartTimeUnixNano 
		InstrumentedPredicate{0, nil}
	ColumnIterator: DurationNanos 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootSpanName 
		InstrumentedPredicate{0, nil}
	ColumnIterator: RootServiceName 
		InstrumentedPredicate{0, nil}
traceMetaCollector{})

Other changes:

  • Adds Stringer to the iterator and predicate interfaces that can be used to generate the dumps above.
  • Adjusted the range of the search query to use the trace time ranges and not the span time ranges to reduce pulled data.
  • Added some memory pooling and other shared memory structures.

Future Work:

  • With this change the span level iterator can likely be reworked to be even more efficient but in the interest of keeping this PR from growing forever this can be done later.

Benchmarks:
The BenchmarkBackendBlockTraceQL was extended to have significantly more test cases. The latest memory improvements caused some of our absolute fastest queries to slow down a bit but it saved huge allocations on the worst queries. I struggle to completely trust the CPU time benchmarks. This bench suite showed a lot of variance on CPU time likely due to thermal throttling on my tiny laptop. Someone should attempt to recreate.

Also note these benchmarks do not take full advantage of the change as they don't run the engine filter.

name                                               old time/op    new time/op    delta
BackendBlockTraceQL/spanAttNameNoMatch-8             2.11ms ± 3%    2.21ms ±12%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8              87.4ms ± 6%     8.0ms ±29%  -90.84%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                 137ms ±32%      68ms ±20%  -50.56%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8        1.97ms ±16%    2.05ms ±13%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8          53.0ms ±41%    67.3ms ±10%     ~     (p=0.151 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8          463µs ±12%     481µs ±10%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8          43.6ms ±12%    15.8ms ± 7%  -63.80%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8            51.5ms ±16%    82.9ms ±30%  +60.92%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8     359µs ±15%     415µs ±38%     ~     (p=0.548 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8      51.5ms ±23%    62.9ms ±40%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8                1.57s ±38%     0.82s ±23%  -47.59%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                 1.22s ±42%     0.53s ±11%  -56.34%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8           588µs ±20%     487µs ±16%  -17.16%  (p=0.032 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8            1.10s ±22%     0.53s ± 5%  -51.51%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8               361µs ±17%     496µs ±38%  +37.25%  (p=0.032 n=5+5)

name                                               old speed      new speed      delta
BackendBlockTraceQL/spanAttNameNoMatch-8            592MB/s ± 3%   567MB/s ±11%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8             253MB/s ± 5%   384MB/s ±24%  +51.91%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8               155MB/s ±26%    26MB/s ±23%  -83.07%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8       625MB/s ±14%   599MB/s ±14%     ~     (p=0.690 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8         356MB/s ±31%   272MB/s ±10%     ~     (p=0.151 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8       1.10GB/s ±11%  1.06GB/s ± 9%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8         372MB/s ±11%   158MB/s ± 6%  -57.42%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8           321MB/s ±15%   211MB/s ±25%  -34.16%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8   167MB/s ±13%   149MB/s ±30%     ~     (p=0.548 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8     322MB/s ±25%   279MB/s ±32%     ~     (p=0.421 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8             16.6MB/s ±32%   7.8MB/s ±20%  -52.89%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8              21.1MB/s ±33%  12.0MB/s ±10%  -43.14%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8         871MB/s ±18%  1048MB/s ±15%  +20.29%  (p=0.032 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8         21.2MB/s ±19%  40.4MB/s ± 5%  +90.90%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8             166MB/s ±15%   126MB/s ±33%  -24.16%  (p=0.032 n=5+5)

name                                               old MB_io/op   new MB_io/op   delta
BackendBlockTraceQL/spanAttNameNoMatch-8               1.25 ± 0%      1.25 ± 0%     ~     (all equal)
BackendBlockTraceQL/spanAttValNoMatch-8                22.1 ± 0%       3.0 ± 0%  -86.38%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                  20.6 ± 0%       1.7 ± 0%  -91.55%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8          1.22 ± 0%      1.22 ± 0%     ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicMatch-8            18.3 ± 0%      18.3 ± 0%     ~     (all equal)
BackendBlockTraceQL/resourceAttNameNoMatch-8           0.51 ± 0%      0.51 ± 0%     ~     (all equal)
BackendBlockTraceQL/resourceAttValNoMatch-8            16.1 ± 0%       2.5 ± 0%  -84.54%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8              16.4 ± 0%      17.1 ± 0%   +4.02%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8      0.06 ± 0%      0.06 ± 0%     ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8        16.1 ± 0%      16.8 ± 0%   +4.17%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8                 24.6 ± 0%       6.4 ± 0%  -74.10%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                  24.6 ± 0%       6.4 ± 0%  -74.10%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8            0.51 ± 0%      0.51 ± 0%     ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchOr-8             22.9 ± 0%      21.4 ± 0%   -6.21%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8                0.06 ± 0%      0.06 ± 0%     ~     (all equal)

name                                               old alloc/op   new alloc/op   delta
BackendBlockTraceQL/spanAttNameNoMatch-8             67.3kB ± 9%    66.3kB ±14%     ~     (p=0.548 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8              10.1MB ± 1%     2.5MB ± 1%  -74.95%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                11.2MB ± 4%     1.9MB ±10%  -82.69%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8        97.6kB ± 3%   100.1kB ± 3%     ~     (p=0.095 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8          7.18MB ± 2%    7.21MB ± 2%     ~     (p=0.222 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8         42.4kB ± 1%    43.2kB ± 1%   +2.01%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8          9.17MB ± 1%    4.82MB ± 1%  -47.37%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8            17.0MB ± 2%    18.8MB ± 2%  +10.47%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8    36.7kB ± 1%    37.7kB ± 0%   +2.65%  (p=0.016 n=5+4)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8      7.78MB ± 2%    9.55MB ± 2%  +22.87%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedNameNoMatch-8                291MB ± 1%       7MB ±11%  -97.70%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                 290MB ± 1%       7MB ±12%  -97.65%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8          43.3kB ± 2%    43.5kB ± 1%     ~     (p=0.310 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8            291MB ± 1%      13MB ± 8%  -95.56%  (p=0.016 n=4+5)
BackendBlockTraceQL/mixedValBothMatch-8              37.7kB ± 1%    38.2kB ± 3%     ~     (p=0.095 n=5+5)

name                                               old allocs/op  new allocs/op  delta
BackendBlockTraceQL/spanAttNameNoMatch-8                362 ± 0%       370 ± 0%   +2.21%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValNoMatch-8                344k ± 0%        1k ± 0%  -99.84%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttValMatch-8                  343k ± 0%        1k ± 0%  -99.69%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicNoMatch-8           349 ± 0%       357 ± 0%   +2.29%  (p=0.008 n=5+5)
BackendBlockTraceQL/spanAttIntrinsicMatch-8           92.4k ± 0%     92.4k ± 0%   +0.04%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttNameNoMatch-8            271 ± 0%       286 ± 0%   +5.54%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch-8           82.6k ± 0%      0.5k ± 0%  -99.36%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttValMatch-8              115k ± 0%      115k ± 0%   +0.26%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicNoMatch-8       162 ± 0%       177 ± 0%   +9.26%  (p=0.008 n=5+5)
BackendBlockTraceQL/resourceAttIntrinsicMatch-8        144k ± 0%      145k ± 0%   +0.55%  (p=0.016 n=4+5)
BackendBlockTraceQL/mixedNameNoMatch-8                4.25M ± 0%     0.00M ± 1%  -99.96%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValNoMatch-8                 4.25M ± 0%     0.00M ± 2%  -99.93%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchAnd-8             285 ± 0%       293 ± 0%   +2.81%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValMixedMatchOr-8            4.28M ± 0%     0.09M ± 0%  -97.82%  (p=0.008 n=5+5)
BackendBlockTraceQL/mixedValBothMatch-8                 176 ± 0%       184 ± 0%   +4.55%  (p=0.008 n=5+5)

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
pkg/traceql/ast_execute_test.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet/block_traceql.go Show resolved Hide resolved
for _, ss := range filteredSpansets {
for _, s := range ss.Spans {
span := s
i.currentSpans = append(i.currentSpans, span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why this is necessary. If the engine contains a by() then i.filter(*spanset) may split it up into multiple spansets and that should be preserved and returned, correct?

Copy link
Member Author

@joe-elliott joe-elliott Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is due to the way the metadata portion of the iterators works. this is ok as long as their is only one spanset per trace, but as discussed this will break once we add by().

should we resolve this now? or solve it when we write by() and coalesce()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaning towards solve when adding by() and coalesce() so we can benefit from the performance improvements now.

tempodb/encoding/vparquet/block_traceql.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet/block_traceql.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet/block_traceql.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet/block_traceql_meta.go Outdated Show resolved Hide resolved
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
@mdisibio mdisibio mentioned this pull request Mar 3, 2023
3 tasks
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work and amazing performance 🚀🚀 99% LGTM, just a few small questions.

pkg/traceql/engine.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet/block_traceql.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet/block_traceql.go Outdated Show resolved Hide resolved
@joe-elliott joe-elliott merged commit 6d8df84 into grafana:main Mar 6, 2023
@joe-elliott joe-elliott mentioned this pull request Apr 28, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Parquet Performance by doing two passes over the data
2 participants