Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.22.0 bis – 1 March 2021 #3287

Closed
paolodamico opened this issue Feb 10, 2021 · 9 comments
Closed

Release 1.22.0 bis – 1 March 2021 #3287

paolodamico opened this issue Feb 10, 2021 · 9 comments
Labels
sprint Sprint planning

Comments

@paolodamico
Copy link
Contributor

paolodamico commented Feb 10, 2021

Context:

  • Couple of deals very close for VPC
  • Parity with Mixpanel by a super active user has been brought up
  • Now that plugins are live for everyone can we increase usage?

Tim

  • Support Hero
  • Toolbar?
  • Finish up formula
  • finish up funnel trends

Paolo

Michael/Marius

Karl/Eric

James G

  • Wrap up the TF script for AWS (P0) GCP (P2)
  • Template the TF scripts so that we can have one per install
    -Template the Helm chart (values.yaml) so that we have one values template for everyone ( we have already hit issues with this with inconsistencies)
  • Finish move to AWS Kafka (seeming blocker for Plugins due to partition limit)
  • For that we need to update Kafka client to use SCRAM SASL auth (diff almost done)
  • Cut over and monitor
  • Clickhouse schema migration/management plan. It's a mess right now
  • Clickhouse replacing tables move to collapsing merge trees
  • Clickhouse SSL/sentry cleanup
  • Clickhouse backups

Yakko

@paolodamico paolodamico added the sprint Sprint planning label Feb 10, 2021
@paolodamico
Copy link
Contributor Author

paolodamico commented Feb 11, 2021

Myself

Here's what I'm thinking for the next release: I want to explore and/or spec out all these features which we have mentioned/discussed at some point. From there, determine which one would be the highest impact (both considering impact as a feature and for our own dogfooding), spec it out in detail:

I'm on the fence on whether we should explore right now, as this sound like might require a different, more comprehensive approach, but open to suggestions,

In addition to the above:

  • Carryover from past release, check persons-2353 & 1694-dashboards, and release or kill.
  • PostHog/retention#6 with @EDsCODE
  • Build dashboards, actions, ... to instrument missing pieces of the app flow.
  • From https://github.com/PostHog/retention/issues/11, consider spinning out activation from core retention, and explore if it makes sense to do it as a feature.
  • As a stretch I would work on start implementing the MVP from whatever comes from the research above.
  • As a stretch I want to see if I can implement a beta version of running PostHog cloud with a custom domain to overcome third-party analytics pains.

Team

@Twixes
Copy link
Collaborator

Twixes commented Feb 11, 2021

My points:

  • Plugin server ingestion on all deployments – Postgres and ClickHouse. Will allow us to decrease overall complexity of ingestion, and it's the natural route since we want plugins to be core to PostHog. We could even do this now… but there's no immediate benefit to users, so we can just enable that right after 1.22 freeze and run with it until 1.23 – in line with the idea of more stable releases.
  • Webhooks rewritten for plugin server ingestion (Outgoing webhooks plugin-server#145) – this has been wonky for a while, as the complexity of the capture endpoint has risen, but we especially need it now so that webhooks are fired with the final event, as processed by plugins.
    Would be super cool done as a plugin, but there are two considerations that complicate this is if we want no regressions: the plugin would have to allow selecting only firing the webhook for specific actions, and the format would have to customisable per specific action.
  • Reworked GeoIP in plugin server (Automatic GeoIP Plugin plugin-server#115) – now having a MaxMind GeoIP DB license, we can rework the GeoIP plugin to be actually useful to everyone, plus offer GeoIP capabilities for any plugin. Will be great for making plugins a more powerful feature.

@mariusandra
Copy link
Collaborator

Plugins on cloud:

  • Post-Mortem for the outage
  • Get rid of the ALTER queries somehow. (CH experts needed!)
  • Gradually re-release plugin server ingestion on cloud
  • Why does 1 server (8 cores) ingest 500 events/sec locally and we need 32 tasks for the same in ECS?
  • Access control for organisations for plugins on cloud instead of ENV variables as described here Roll out plugins fully #3291 (comment)

Other important tasks:

Stretch goal:

  • Put feature flag toggles on the toolbar

@macobo
Copy link
Contributor

macobo commented Feb 11, 2021

Myself

The previous release was spent working on stability improvements. We've done great on that front - our sentry at least is a lot less noisy than before.

Further stability improvements

However, there are still quite a few top-level issues which need looking into:

  1. We have a ton of unresolved "clickhouse" issues - 95% of which are connection issues from since we moved clusters/related to SSL/ELB. This would be causing our users much unneeded pain. It's an infra issue, but can take time to dig if james does not have the focus https://sentry.io/organizations/posthog/issues/?project=1899813&query=is%3Aunresolved++assigned%3A%23clickhouse&statsPeriod=14d
  2. Retention queries being broken - Retention table panels load wrong data #3018 Bug with CH persons double click on retention table #2794 Hourly retention is broken #3293 + unresolved sentry issues.
  3. Webhooks - we've gotten quite a few reports from users saying the webhooks feature is broken. @Twixes made some improvements last sprint, but this inadvertently caused
  4. Session recording related errors

New features

Rather than full maintencence mode, let's also ship new things/value:

  1. Ship emails
  2. Date filter on dashboards
  3. Toolbar improvements
  4. Improve demo data

Or anything else w/ sessions/core analytics? Build competence there.

Misc

  1. Investigate and fix CSP header issues for posthog-js Document CSP headers requirements posthog.com#672
  2. Analyze and interview ppl for session recording

TBD, did not manage to fill this out.

@EDsCODE
Copy link
Member

EDsCODE commented Feb 11, 2021

Myself

  • Will be picking up the retention cleanup. I had a PR hanging for a while until earlier this week and it revealed some other issues with the person query. Bug with CH persons double click on retention table #2794
  • trying to figure out backend test format that is extensible and higher coverage. possibly expand to unit tests
  • analysis with Paolo this week finally
  • debug/activate remove-shownas and push full changes when necessary

Stretch:

  • want to make our release QA more disciplined. Create another entire instance or use k8s solution that was mentioned

@fuziontech
Copy link
Member

quick update for myself

  • Wrap up the TF script for AWS (P0) GCP (P2)
  • Template the TF scripts so that we can have one per install
  • Template the Helm chart (values.yaml) so that we have one values template for everyone ( we have already hit issues with this with inconsistencies)
  • Finish move to AWS Kafka (seeming blocker for Plugins due to partition limit)
  • For that we need to update Kafka client to use SCRAM SASL auth (diff almost done)
  • Cut over and monitor
  • Clickhouse schema migration/management plan. It's a mess right now
  • Clickhouse replacing tables move to collapsing merge trees

@Twixes
Copy link
Collaborator

Twixes commented Feb 11, 2021

Also one thing I'd like to do is work on an engineering blog post touching on plugins, but that'll require some more thought (a short series would do well for instance) and is not a significant priority.

@paolodamico
Copy link
Contributor Author

One thing that came up for me that wasn't planned is https://github.com/PostHog/ops/issues/178 (@jamesefhawkins maybe you want to provide some additional context on its priority). I think it's worth using this as an example of what do we want to do when we need to re-prioritize some stuff mid-release. Currently we just take on whatever seems more urgent/important at the time, which is fine, but it can sometimes lead to not the best possible outcome, because the most immediate thing always tends to seem like the most important and urgent one. If for instance right now we agree that this is more pressing that any of the other stuff already planned, we can now consciously remove something that was planned and will now get bumped out. What are people's general thoughts on this?

@Twixes Twixes changed the title Release 1.23.0 – 1 March 2021 Release 1.22.0 bis – 1 March 2021 Feb 22, 2021
@Twixes
Copy link
Collaborator

Twixes commented Mar 12, 2021

Released.

@Twixes Twixes closed this as completed Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint Sprint planning
Projects
None yet
Development

No branches or pull requests

6 participants