Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow querying "Users who have done $foo in $timerange" #2594

Open
macobo opened this issue Dec 1, 2020 · 17 comments
Open

Allow querying "Users who have done $foo in $timerange" #2594

macobo opened this issue Dec 1, 2020 · 17 comments
Labels
enhancement New feature or request feature/trends Feature Tag: Trends team/product-analytics

Comments

@macobo
Copy link
Contributor

macobo commented Dec 1, 2020

Is your feature request related to a problem?

Currently there's no quick way to e.g. count pageviews from users who signed up in the last 7 days. Or count signups from users who did the sign up between X and Y.

Describe the solution you'd like

Make these queries possible

Describe alternatives you've considered

Creating cohorts and them for these queries. This is slow for ad-hoc queries.

Additional context

Here's how this query gets built in Heap.

Screenshot from 2020-12-01 12-55-15

I think doing this is also valuable as it would:

  1. Allow us to use same filtering system in cohorts as in insights (standardizing our query building + UI)
  2. Allow us to use a similar system within sessions (e.g. show sessions where user has done Y or Z)
  3. Allow jumping from e.g. insights graph to sessions in a natural way.
@macobo macobo added the enhancement New feature or request label Dec 1, 2020
@macobo macobo changed the title Allow querying "Users who have done $foo" Allow querying "Users who have done $foo in $timerange" Dec 1, 2020
@timgl
Copy link
Collaborator

timgl commented Dec 3, 2020

I like it. do you have any idea on how heap managed to make this performant? I think this should perform well on Clickhouse (we compute cohorts on the fly for queries anyway) but psql might struggle.

@macobo
Copy link
Contributor Author

macobo commented Dec 3, 2020

Performance-wise this is indeed quite a heavy query.

Let's decompose this query. This would likely work something like

  1. Get the list of users who have done $foo in $timerange
  2. Join that list against main event (e.g. count of $pageview)

2 is pretty easy and we do it elsewhere. How might you do 1 in a performant way?

In postgres for actions we could automatically create a partial index.

Then if we write the query correct, postgres should combine the partial index with the time range filter and get the list of distinct_ids reasonably fast as it's only looking through a small subset of data. As elsewhere the distinct_id -> person_id join can become a bit sketchy at scale though.

To scale this out, heap used to shard their dataset by person_id (contrast that with our ch setup)

Clickhouse should work similarly, though untimebounded queries are expensive due to the way we currently set up partitioning. - it would need to query all parts. Note this also affects cohorts queries performance.

@macobo
Copy link
Contributor Author

macobo commented Jul 1, 2021

User was requesting it as well.

I am looking for a way to view all users (by workspace) to see who has NOT logged in in the last week

Since this is a bit buried, cc @EDsCODE and @paolodamico

@paolodamico
Copy link
Contributor

I believe cohorts will now support this once #4574 is merged, but still useful to consider for direct filtering.

@chrishoyle
Copy link

Hey @paolodamico, I'm trying to find a way to support the second half of this issue with the new filters ui.

I am looking for a way to view all users (by workspace) to see who has NOT logged in in the last week

I attempted this by creating a dynamic cohort with users that have performed X event exactly 0 times in the last week. This did not work as expected for me and still returns users that match that performed X event. Is this the expected behavior? Am I thinking about this right?

Also related #4955 and #3032

image

@paolodamico
Copy link
Contributor

Apologies for the delay here. Tagging @EDsCODE for context. I think if you're matching exactly 0 times we would expect to see users who didn't perform the event at all in the timeframe.

@marcushyett-ph
Copy link
Contributor

@EDsCODE nudging on this one as I've been helping someone to try and do something similar e.g. https://app.posthog.com/cohorts/2741

And it definitely doesn't return a list of people who have not had pageviews in the last week - it's not quite clear what it is returning.

@macobo
Copy link
Contributor Author

macobo commented Nov 15, 2021

cc @clarkus I'd love to push this into development in the coming months but this "breaks" our current property filtering in a few ways.

@clarkus
Copy link
Contributor

clarkus commented Dec 1, 2021

Updated filter concepts that allow for specifying done / not done within a given time range. Note this also encompasses work from #2273.

Filters
https://www.figma.com/file/gQBj9YnNgD8YW4nBwCVLZf/PostHog-App?node-id=5589%3A33881

@paolodamico
Copy link
Contributor

I think it would be super helpful to see this in the context of insights (and wherever else we would use this). Here it looks pretty cool, but I'm not sure it'll fit properly with our current insight layout (unless you're considering a change there too?)

@clarkus
Copy link
Contributor

clarkus commented Dec 3, 2021

I think it would be super helpful to see this in the context of insights (and wherever else we would use this). Here it looks pretty cool, but I'm not sure it'll fit properly with our current insight layout (unless you're considering a change there too?)

Yup I am working from the inside to the outside on this one - most of the complexity is in these repeating filter rules - they have different composition depending on how you're filtering. The same pattern is going to show up in breakdowns / group bys. I am going to work the whole set back into a query builder update that addresses this plus some other open issues.

@macobo
Copy link
Contributor Author

macobo commented Jan 11, 2022

This came up in a call with $LargePotentialClient - they do a lot of their analysis based on engagement. cc @paolodamico who can provide more usecases.

I'm planning on adding this capability to the backend as part of solving #5854 soon!

@timgl
Copy link
Collaborator

timgl commented Aug 17, 2022

The one thing missing from this ticket is the trailing 7 day average. You don't want to say "user has done X in the last 7 days" but "user has done X in the 7 days before this date point within the chart"

@camerondeleone
Copy link
Contributor

Another variation on filtering within insights that's been requested (internal)

we capture a user-login event for every user that logs in. We want to find out “what’s the average # of days a user logs in per week”

@PostHog PostHog deleted a comment from posthog-contributions-bot bot Mar 30, 2023
@fuziontech
Copy link
Member

@posthog-bot summarize

@posthog-contributions-bot
Copy link
Contributor

@fuziontech

The conversation covers various topics related to query performance and filter functionality in PostHog. There is a discussion on how to make a heavy query more performant through partial indexing and sharding. There is also talk about adding filtering options to view users who have not performed an event in a certain time frame. The conversation includes updates on work being done to improve the filtering UI, with suggestions for including a trailing 7-day average and addressing issues with filter composition. Finally, there is a request for filtering functionality to calculate the average number of days a user logs in per week.

@Twixes
Copy link
Collaborator

Twixes commented Jun 19, 2024

Update: We now have "user has done X in the last 7 days" via cohorts, although we don't have "user has done X in the 7 days before this date point within the chart".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature/trends Feature Tag: Trends team/product-analytics
Projects
None yet
Development

No branches or pull requests

9 participants