TPCH q15 performance regression after introduce `local_delta` in MemoryTracker #4451

windtalker · 2022-03-28T03:00:23Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

load tpch-100 data, using 1 TiFlash node
run tpch q15

2. What did you expect to see? (Required)

The query time varies from 1.5 second to 3.0 second randomly

3. What did you see instead (Required)

the query time should be stable
the query time should less than 2 second

4. What is your TiFlash version? (Required)

master @ 225dabe

5. Root cause

The root cause is in TiFlash, the ParallelAggregatingBlockInputStream calculate the aggregation in 2 stage:

stage 1: it use ParallelInputsProcessor to do a partial aggregation for each input pipeline
stage 2: it merge the result of stage 1

Obviously, stage 1 is running using multiple threads, and depends on the size of result data set, stage 2 will use 1 threads or multiple threads: if the total agg key size exceeded group_by_two_level_threshold or the total result size exceeded group_by_two_level_threshold_bytes, stage 2 will use multiple threads othewise, it will use 1 thread. And if stage 2 is executed using 1 threads, all the result will be put into 1 single block.

The total result size is estimated using memory_tracker: before executed aggregation, the overall memory usage is saved as memory_usage_before_aggregation in Aggregator, and during the executed of stage 1, it use current_memory_usage - memory_usage_before_aggregation to decide if need to convert the aggregated hash table into two-level hash table. And if the hash table is converted into two-level hash table, it will use multiple threads to do the stage 2.

The problem is after introducing local_delta, if the memory usage is less than 8MB, it will not update the global memory tracker. By default the group_by_two_level_threshold_bytes is 100MB, so assuming that the stage 1 is executed using 20 threads, and each thread uses 7.9MB, then the actual memory usage will be ~158MB, but since all of these memory usages is tracked in local_delta, the global memory tracker does not see these memory usage, so the hash table will not be converted to two-level hash table, thus the stage 2 is executed using 1 thread.

The text was updated successfully, but these errors were encountered:

close #4451

windtalker added the type/bug The issue is confirmed as a bug. label Mar 28, 2022

lilinghai added the severity/major label Mar 28, 2022

ti-chi-bot added may-affects-4.0 may-affects-5.0 may-affects-5.1 may-affects-5.2 may-affects-5.3 may-affects-5.4 may-affects-6.0 labels Mar 28, 2022

windtalker mentioned this issue Apr 2, 2022

Make performance of TPCH q15 stable #4570

Merged

12 tasks

windtalker added affects-5.3 affects-5.4 affects-6.0 and removed may-affects-4.0 may-affects-5.0 may-affects-5.1 may-affects-5.2 may-affects-5.3 may-affects-5.4 may-affects-6.0 labels Apr 2, 2022

jebter added the component/compute label Apr 14, 2022

ti-chi-bot closed this as completed in #4570 Apr 19, 2022

ti-chi-bot pushed a commit that referenced this issue Apr 19, 2022

Make performance of TPCH q15 stable (#4570)

feee96a

close #4451

This was referenced Apr 19, 2022

Make performance of TPCH q15 stable (#4570) #4707

Merged

Make performance of TPCH q15 stable (#4570) #4709

Closed

Make performance of TPCH q15 stable (#4570) #4710

Merged

ti-chi-bot added a commit that referenced this issue Apr 22, 2022

Make performance of TPCH q15 stable (#4570) (#4707)

0053753

close #4451

ti-chi-bot added a commit that referenced this issue Jun 15, 2022

Make performance of TPCH q15 stable (#4570) (#4710)

f090503

close #4451

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPCH q15 performance regression after introduce `local_delta` in MemoryTracker #4451

TPCH q15 performance regression after introduce `local_delta` in MemoryTracker #4451

windtalker commented Mar 28, 2022 •

edited

Loading

TPCH q15 performance regression after introduce local_delta in MemoryTracker #4451

TPCH q15 performance regression after introduce local_delta in MemoryTracker #4451

Comments

windtalker commented Mar 28, 2022 • edited Loading

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

5. Root cause

TPCH q15 performance regression after introduce `local_delta` in MemoryTracker #4451

TPCH q15 performance regression after introduce `local_delta` in MemoryTracker #4451

windtalker commented Mar 28, 2022 •

edited

Loading