-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow querying "Users who have done $foo in $timerange" #2594
Comments
I like it. do you have any idea on how heap managed to make this performant? I think this should perform well on Clickhouse (we compute cohorts on the fly for queries anyway) but psql might struggle. |
Performance-wise this is indeed quite a heavy query. Let's decompose this query. This would likely work something like
2 is pretty easy and we do it elsewhere. How might you do 1 in a performant way? In postgres for actions we could automatically create a partial index. Then if we write the query correct, postgres should combine the partial index with the time range filter and get the list of distinct_ids reasonably fast as it's only looking through a small subset of data. As elsewhere the distinct_id -> person_id join can become a bit sketchy at scale though. To scale this out, heap used to shard their dataset by person_id (contrast that with our ch setup) Clickhouse should work similarly, though untimebounded queries are expensive due to the way we currently set up partitioning. - it would need to query all parts. Note this also affects cohorts queries performance. |
User was requesting it as well.
Since this is a bit buried, cc @EDsCODE and @paolodamico |
I believe cohorts will now support this once #4574 is merged, but still useful to consider for direct filtering. |
Hey @paolodamico, I'm trying to find a way to support the second half of this issue with the new filters ui.
I attempted this by creating a dynamic cohort with users that have performed X event exactly 0 times in the last week. This did not work as expected for me and still returns users that match that performed X event. Is this the expected behavior? Am I thinking about this right? |
Apologies for the delay here. Tagging @EDsCODE for context. I think if you're matching exactly 0 times we would expect to see users who didn't perform the event at all in the timeframe. |
@EDsCODE nudging on this one as I've been helping someone to try and do something similar e.g. https://app.posthog.com/cohorts/2741 And it definitely doesn't return a list of people who have not had pageviews in the last week - it's not quite clear what it is returning. |
cc @clarkus I'd love to push this into development in the coming months but this "breaks" our current property filtering in a few ways. |
Updated filter concepts that allow for specifying done / not done within a given time range. Note this also encompasses work from #2273.
|
I think it would be super helpful to see this in the context of insights (and wherever else we would use this). Here it looks pretty cool, but I'm not sure it'll fit properly with our current insight layout (unless you're considering a change there too?) |
Yup I am working from the inside to the outside on this one - most of the complexity is in these repeating filter rules - they have different composition depending on how you're filtering. The same pattern is going to show up in breakdowns / group bys. I am going to work the whole set back into a query builder update that addresses this plus some other open issues. |
This came up in a call with $LargePotentialClient - they do a lot of their analysis based on engagement. cc @paolodamico who can provide more usecases. I'm planning on adding this capability to the backend as part of solving #5854 soon! |
The one thing missing from this ticket is the trailing 7 day average. You don't want to say "user has done X in the last 7 days" but "user has done X in the 7 days before this date point within the chart" |
Another variation on filtering within insights that's been requested (internal)
|
@posthog-bot summarize |
The conversation covers various topics related to query performance and filter functionality in PostHog. There is a discussion on how to make a heavy query more performant through partial indexing and sharding. There is also talk about adding filtering options to view users who have not performed an event in a certain time frame. The conversation includes updates on work being done to improve the filtering UI, with suggestions for including a trailing 7-day average and addressing issues with filter composition. Finally, there is a request for filtering functionality to calculate the average number of days a user logs in per week. |
Update: We now have "user has done X in the last 7 days" via cohorts, although we don't have "user has done X in the 7 days before this date point within the chart". |
Is your feature request related to a problem?
Currently there's no quick way to e.g. count pageviews from users who signed up in the last 7 days. Or count signups from users who did the sign up between X and Y.
Describe the solution you'd like
Make these queries possible
Describe alternatives you've considered
Creating cohorts and them for these queries. This is slow for ad-hoc queries.
Additional context
Here's how this query gets built in Heap.
I think doing this is also valuable as it would:
The text was updated successfully, but these errors were encountered: