-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(projects): Increase project slug limit from 50 -> 100 #75737
chore(projects): Increase project slug limit from 50 -> 100 #75737
Conversation
# code has deployed. So this should not be used for most operations that alter the schema | ||
# of a table. | ||
# Here are some things that make sense to mark as post deployment: | ||
# - Large data migrations. Typically we want these to be run manually so that they can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this a post deployment migration? not quite sure how many projects there are or the scale of this migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, post deployment is for backfills and large indexes only, not things that modify the schema.
Increasing the size of this column should be fast. If it's not, we have a statement timeout that will cause the migration to abort within a few seconds.
I'd say that just make sure you're around to monitor this and revert it if it won't run - it should be ok though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might get hit with an index rebuild which could throw off query performance for slug lookups. The postgres docs state
When this form is used, the column's statistics are removed, so running ANALYZE on the table afterwards is recommended.
We have had an incident where a column resize invalided stats and caused an outage. We could have a similar outcome here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you run ANALYZE, the example in docs is a little unclear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call @markstory, I'd forgotten about that. Maybe in that case, we should make this a post_deploy migration and trigger the analyze as part of it, or even have ops run the entire thing manually for us.
I doubt we can run the ANALYZE as part of the deploy, it'll very likely time out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched it to post deploy, do I have to include the ANALYZE as part of the migration code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, increasing the size of a varchar has been fast for a long time: https://www.postgresql.org/docs/9.2/release-9-2.html#AEN114949
# code has deployed. So this should not be used for most operations that alter the schema | ||
# of a table. | ||
# Here are some things that make sense to mark as post deployment: | ||
# - Large data migrations. Typically we want these to be run manually so that they can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, post deployment is for backfills and large indexes only, not things that modify the schema.
Increasing the size of this column should be fast. If it's not, we have a statement timeout that will cause the migration to abort within a few seconds.
I'd say that just make sure you're around to monitor this and revert it if it won't run - it should be ok though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking for now until we figure out the best way to run
Codecov ReportAll modified and coverable lines are covered by tests ✅ ✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #75737 +/- ##
===========================================
+ Coverage 56.73% 78.39% +21.66%
===========================================
Files 7053 6874 -179
Lines 311232 306538 -4694
Branches 50881 50115 -766
===========================================
+ Hits 176573 240312 +63739
+ Misses 130115 59800 -70315
- Partials 4544 6426 +1882 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little tricky to run safely, just because we've seen an incident when widening a column before.
Running the analyze will fix the problem, however it takes a while to run, and if we end up locking on sentry_project then we'll be hard down until the analyze runs.
Mark and I discussed this on slack, and we have a couple of options that you could try:
Firstly, we could ask ops to duplicate sentry_project
and then we could run a test on it. The steps here would be
- duplicate
sentry_project
tosentry_project_test
- Run an
ANALYZE
onsentry_project_test
to make sure the stats are in a good state - Take a copy of the stats for this column on the table https://redash.getsentry.net/queries/7220/source
- Get query plans for queries that use
slug
related indexes. This would be something likeexplain select * from sentry_project_test where organization_id = 1 and slug='sentry'
- Run the sql to widen the column
- Re-run the stats query and compare to the original, see if they have significantly changed
- Re-run the explain from above and compare it to before, see if it has significantly changed.
If things look good here, then we could ask ops to run these commands manually for us:
- Take a copy of the stats row from this query https://redash.getsentry.net/queries/7220/source. Convert it into an update statement, so that we can set the stats back onto the row if we see a problem
- ALTER TABLE "sentry_project" ALTER COLUMN "slug" TYPE varchar(100);
- Monitor sentry for any problems. If we see them, run the update statement from the first step. Note: We've never tried to perform stats surgery before, so this is a little risky. We should probably test it out on the test table from above first.
- If we're still seeing problems, run an analyze on the table
ok let me ask ops tmr |
Progress Tracking
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving so that we can merge this on Monday
This PR has a migration; here is the generated SQL for --
-- Alter field slug on project
--
ALTER TABLE "sentry_project" ALTER COLUMN "slug" TYPE varchar(100); |
Increase project slug limit from 50 -> 100. There are no FE changes needed b/c we don't validate length on frontend, but show the error received from the failed POST request.
Also switch to using a constant for default slug length.
Closes #73032
Fixes ALRT-195