Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace PODArray with PODArrayWithStackMemory in AggregateFunctionWindowFunnelData #18817

Merged
merged 3 commits into from
Jan 9, 2021

Conversation

ucasfl
Copy link
Collaborator

@ucasfl ucasfl commented Jan 7, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Replace PODArray with PODArrayWithStackMemory in AggregateFunctionWindowFunnelData to improvement windowFunnel function performance.

Detailed description / Documentation draft:

Funnel analysis is very useful in many scenarios.

In Tencent, we use windowFunnel aggregate function to analyze user events, and it's slow. we profile with perf and found that lots of time is used in memory allocation, CPU utilization is not very high.

We consider the implementation of this function, and analyze it with specific sql, then we found that usually, we always caculate windowFunnel with group by userid and lots of fileter condition, for example:

SELECT
    level,
    count()
FROM
(
    SELECT
        uid,
        windowFunnel(86400)(time, event = '浏览', event = '点击', event = '下单', event = '支付') AS level
    FROM action where ...// lots of filter condition
    GROUP BY uid
)
GROUP BY level

After filter, each user's event list is not very big, so we replace PODArray with PODArrayWithStackMemory in AggregateFunctionWindowFunnelData and gain a lot of performance improvement, maybe we should consider about the implementation here?

@robot-clickhouse robot-clickhouse added the pr-performance Pull request with some performance improvements label Jan 7, 2021
@alexey-milovidov alexey-milovidov self-assigned this Jan 8, 2021
@alexey-milovidov
Copy link
Member

Yes, this is very typical optimization for aggregate functions!
I did not dig into windowFunnel implementation in more detail - maybe there are more possible optimizations.

This PR is Ok to merge, but let's add performance test to prove and keep the improvement, see tests/performance.

@ucasfl
Copy link
Collaborator Author

ucasfl commented Jan 8, 2021

Yes, this is very typical optimization for aggregate functions!
I did not dig into windowFunnel implementation in more detail - maybe there are more possible optimizations.

This PR is Ok to merge, but let's add performance test to prove and keep the improvement, see tests/performance.

Ok, currently, we still working to find more possible optimization.

@alexey-milovidov
Copy link
Member

@alexey-milovidov alexey-milovidov merged commit fe1c153 into ClickHouse:master Jan 9, 2021
@ucasfl ucasfl deleted the wf branch January 23, 2021 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants