Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the precise definition of WindowFunnel's “strict” MODE? #27469

Closed
BiteTheDDDDt opened this issue Aug 9, 2021 · 3 comments · Fixed by #27563
Closed

What is the precise definition of WindowFunnel's “strict” MODE? #27469

BiteTheDDDDt opened this issue Aug 9, 2021 · 3 comments · Fixed by #27563
Labels
bug Confirmed user-visible misbehaviour in official release question Question?

Comments

@BiteTheDDDDt
Copy link
Contributor

BiteTheDDDDt commented Aug 9, 2021

According to my reading of the source code, it seems that strict = true, will return Event INDEX directly.
such as the event sequence is A->B->C->B->D and conditional sequence isA->B->C->D, returns 2 (not 3 or 4) when searching the second B.
But if the conditional sequence isA->B->C, it will return directly to 3 when searching C.
This makes me feel confused to strict.

        for (const auto & pair : data.events_list)
        {
            const T & timestamp = pair.first;
            const auto & event_idx = pair.second - 1;
            if (strict_order && event_idx == -1)
            {
                if (first_event)
                    break;
                else
                    continue;
            }
            else if (event_idx == 0)
            {
                events_timestamp[0] = std::make_pair(timestamp, timestamp);
                first_event = true;
            }
            else if (strict && events_timestamp[event_idx].has_value())
            {
                return event_idx + 1;//example1 return here
            }
            else if (strict_order && first_event && !events_timestamp[event_idx - 1].has_value())
            {
                for (size_t event = 0; event < events_timestamp.size(); ++event)
                {
                    if (!events_timestamp[event].has_value())
                        return event;
                }
            }
            else if (events_timestamp[event_idx - 1].has_value())
            {
                auto first_timestamp = events_timestamp[event_idx - 1]->first;
                bool time_matched = timestamp <= first_timestamp + window;
                if (strict_increase)
                    time_matched = time_matched && events_timestamp[event_idx - 1]->second < timestamp;
                if (time_matched)
                {
                    events_timestamp[event_idx] = std::make_pair(first_timestamp, timestamp);
                    if (event_idx + 1 == events_size)
                        return events_size;//example2 return here
                }
            }
        }

I see a strict design purpose in #6548.
I think the answer in the first example should be 3 instead of 2. Is this a bug?

@BiteTheDDDDt BiteTheDDDDt added the question Question? label Aug 9, 2021
@tangxueming
Copy link

I confused about mode too.

// table schema
CREATE TABLE funnel_test
(
    `code` UInt32 ,
    `user_id` String ,
    `timestamp` UInt64
)
ENGINE = MergeTree()
ORDER BY (user_id, timestamp)
SETTINGS index_granularity = 8192;
// insert data
insert into funnel_test values (1,'id',1),(2,'id',3),(4,'id',4),(1,'id',11),(2,'id',13),(3,'id',15)
// query
// strict_increase mode return 3 rows
select user_id, windowFunnel(20,'strict_increase')(toDateTime(timestamp), code = 1, code = 2, code=3) as level from funnel_test group by user_id 
// strict_order mode return 2 rows
select user_id, windowFunnel(20,'strict_order')(toDateTime(timestamp), code = 1, code = 2, code=3) as level from funnel_test group by user_id
// strict mode return 2 rows
select user_id, windowFunnel(20,'strict')(toDateTime(timestamp), code = 1, code = 2, code=3) as level from funnel_test group by user_id
// without mode return 3 rows
select user_id, windowFunnel(20)(toDateTime(timestamp), code = 1, code = 2, code=3) as level from funnel_test group by user_id

modify first data code from 1 to 6 ,all return 3 rows

truncate table funnel_test
insert into funnel_test values (6,'id',1),(2,'id',3),(4,'id',4),(1,'id',11),(2,'id',13),(3,'id',15)

@alexey-milovidov
Copy link
Member

CC @achimbab

@achimbab
Copy link
Contributor

@BiteTheDDDDt
It seems a bug, I will fix it ASAP.
Thank you.

@alexey-milovidov alexey-milovidov added the bug Confirmed user-visible misbehaviour in official release label Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release question Question?
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants