-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve instance status report #2271
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posthog/tasks/status_report.py
Outdated
def fetch_events_count_by_name(params: Tuple[Any, ...]) -> dict: | ||
results = fetch_sql( | ||
""" | ||
SELECT event as name, COUNT(*) as count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's related to this PR, but maybe we should obfuscate the name if the user has selected to anonymize their data?
posthog/tasks/status_report.py
Outdated
def fetch_events_count_by_name(params: Tuple[Any, ...]) -> dict: | ||
results = fetch_sql( | ||
""" | ||
SELECT event as name, COUNT(*) as count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if it's related to this PR but wondering if we should obfuscate the event names if the users chose to anonymize their data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file an issue - there was no semantic change to the previous behavior here, I'd like to avoid ballooning this into a bigger deal.
"period": {"start_inclusive": period_start.isoformat(), "end_inclusive": period_end.isoformat()}, | ||
} | ||
|
||
report["users_who_logged_in"] = [ | ||
{"id": user.id, "distinct_id": user.distinct_id} | ||
if user.anonymize_data | ||
else {"id": user.id, "distinct_id": user.distinct_id, "first_name": user.first_name, "email": user.email} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is unrelated, but adding it here as this PR is an improvement on this code. I don't think it ever makes sense to send the user's first name and email, we're just adding PII here that it'll be very hard to keep track of. With the distinct ID we should be able to get their name if they haven't anonymized their data.
Also maybe send the total number of registered users too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file a separate issue with a suggestion how to anonymize. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll make a PR for this after this one is merged
Maybe, but I'd like to see the problem before 'fixing' it. I'll update the task to handle failures in a way that reaches us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! think the only last point that might bring up is if it's worth changing COUNT(*)
to COUNT(1)
(timgl suggestion)
4ec9543
to
331ae9b
Compare
Also discovered issue Instance status report task reports wrong numbers #2255
Related issue User property for self-hosted deployment host #2138
Fixes Instance status report task reports wrong numbers #2255
Checklist