Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The execution time in pipeline mode is distortion sometimes #9489

Closed
Tracked by #56232
windtalker opened this issue Sep 29, 2024 · 2 comments · Fixed by #9566
Closed
Tracked by #56232

The execution time in pipeline mode is distortion sometimes #9489

windtalker opened this issue Sep 29, 2024 · 2 comments · Fixed by #9566

Comments

@windtalker
Copy link
Contributor

windtalker commented Sep 29, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!
Running the same query in pipeline mode and pull mode, the overall query peformance is almost the same, but the execution time for each executor differs a lot between pipeline mode and pull mode

In pipeline mode
Screenshot 2024-09-30 at 13 07 44

HashAgg: 2.23s
Projection: 2.71s
Selection: 1.75s

Cpu flame graph
Screenshot 2024-09-29 at 13 46 26

And in pull mode
Screenshot 2024-09-29 at 13 51 35
HashAgg: 4.4s
Projection: 2.8s
Selection: 0.26s

Cpu flame graph
Screenshot 2024-09-29 at 13 48 31

And compared with cpu flame graph, the executor time in pull mode is much more reasonable.

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

@windtalker windtalker added the type/bug The issue is confirmed as a bug. label Sep 29, 2024
@windtalker windtalker changed the title The execution time in pipeline mode is distortion sometimes The execution time in pipeline mode is distortion sometimes Sep 29, 2024
@windtalker
Copy link
Contributor Author

windtalker commented Sep 30, 2024

After some investigations, I don't find obviously bugs that is releated to the executor's execution time. One possible cause of this distortion seems to be in PipelineExec::finalizeProfileInfo

void PipelineExec::finalizeProfileInfo(UInt64 queuing_time, UInt64 pipeline_breaker_wait_time)
{
// For the pipeline_breaker_wait_time, it should be added to the pipeline breaker operator(AggConvergent and JoinProbe),
// However, if there are multiple pipeline breaker operators within a single pipeline, it can become very complex.
// Therefore, to simplify matters, we will include the pipeline schedule duration in the execution time of the source operator.
//
// For the queuing_time, it should be evenly distributed across all operators.
//
// TODO Refining execution summary, excluding extra time from execution time.
// For example: [total_time:6s, execution_time:1s, queuing_time:2s, pipeline_breaker_wait_time:3s]
// The execution time of operator[i] = self_time_from_profile_info + sum(self_time_from_profile_info[i-1, .., 0]) + (i + 1) * extra_time / operator_num.
source_op->getProfileInfo()->execution_time += pipeline_breaker_wait_time;

It evenly distributes the task queue time to all the operators.

In this specific case, the TiDB's executor tree is
TableScan => Selection => Projection => Aggregation
The task queue time is about 28.32 seconds.

In pipemode, the executor tree is converted to a pipeline task that looks like

SourceOp(TableScan)=>FilterTransformOp=>ExpressionTransformOp(Selection)=>ExpressionTransformOp=>ExpressionTransfromOp(Projection)=>SinkOp(Aggregation)

The concurrency is 8.
Now assuming all the task has the same queue time, then each task the queue time is 28.32/8 = 3.54s, for each op, the queue time is 3.54/6 = 0.59s
If we remove the task queue time, then each executor's execution time will be
HashAgg: 1.64s
Projection: 1.53s
Selection: 0.57s
TableScan: 0.42s

Although still not the same as pull mode, the results is much better than the original one.

Actually, the output of Explain analyze is designed with the assumption that the execution model is similar to the volcano execution model, while pipeline execution model is completely different from the volcano execution model, it is kind of expected that the execution time displayed in Explain analyze maybe distortion. I think we need a comprehensive design of how to adapt the output of explain analyze in the pipeline model. cc @yibin87

@windtalker
Copy link
Contributor Author

Change this to an enhancement

@windtalker windtalker added type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Oct 15, 2024
ti-chi-bot bot added a commit that referenced this issue Nov 13, 2024
…mmary (#9566)

close #9489

Signed-off-by: yibin <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Nov 13, 2024
ti-chi-bot bot pushed a commit that referenced this issue Nov 14, 2024
…mmary (#9566) (#9612)

close #9489

Signed-off-by: yibin <[email protected]>
Signed-off-by: yibin <[email protected]>

Co-authored-by: yibin <[email protected]>
Co-authored-by: yibin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment