-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition the version_downloads
table
#2203
Conversation
⚠️ this commit is part of a PR containing destructive commits, which must be deployed separately with steps in between. Do not deploy this commit with other commits, and make sure you read the instructions below.⚠️ This adds the partitioned version of the table, a trigger adds new data to the new table, and a binary is responsible for backfilling the old data. The `processed` column is notably missing. This is for two reasons. The first is that we can't partition on a boolean column, so any queries that were focused on `processed` will need some date filter to be performant on the new table. The second is that while the query planner prunes unused partitions for selects, it does not appear to do so for updates. This caused the query which set `processed = 't'` to perform unreasonably slow against the partitioned table. This might be fixed in PG 12, which I didn't test against. We could also update the most recent partition directly, which would fix the issue. But ultimately the `processed` column only really existed so `update-downloads` could go an unlimited amount of time without running and still recover. Realistically that's just not going to happen, and with a sufficiently large margin (say 1 week), if we really go that long without realizing that this is broken, we have much bigger failures to worry about. Either way we need some date filter on these queries to be performant. So while I think that's a good move even in a vacuum, `processed` just stops having a purpose. The trigger is pretty standard. I added the `IF NEW IS DISTINCT FROM OLD`, since I think we might keep the non-partitioned table around for a bit after the transition, which means we'll want a trigger updating that table. The reason we'd ever need both is a bit dense for this commit (I'm happy to go into detail about caveats of atomic swaps in PG if anyone has strong concerns here), but the short version is that for at least a short instant the name of the table will not necessarily be relevant, so when we swap there will be a very short instant where writes happen to both tables. More likely we'll just move forward with the transition, and accept that we'll have to manually reconcile the old table or lose some download data if we need to revert. We don't want the backfilling query to overwhelm the database to the point where production traffic is affected. If we were using something purpose built for this, like https://github.com/soundcloud/lhm, we would have a query that operates in rows/time. However, writing that query for a table without an incrementing ID is painful, and going by date will do at most 200,000 rows at once. While LHM's default of 40,000/0.1s is a very good default in my personal experience, spiking to 5x that should be perfectly fine here. This commit does not include any changes that operate on the partitioned table (and in fact no commit will be doing that until the final switch, since the intention is for this new table to essentially operate everywhere the old did, with the only changes being adding some date filters). Deployment instructions ======================= After this commit is deployed, run `backfill_version_downloads`, and ensure it exits without errors. A zero exit status will indicate that the `version_downloads_part` table contains the same data as `version_downloads`. The trigger on `version_downloads` will cause that to continue to be the case.
Since the partitioned `version_downloads` table doesn't contain this column, we need to remove any references to it before actually doing a migration which would remove it. Deployment instructions ======================= This commit does not do anything destructive, and can technically be deployed with the previous commit. It must be deployed *before* the next commit, however. This commit is safe to roll back.
Note that as of this commit, the DB dump is broken, as it can't handle partitioned tables. We're keeping the old table around in case we need to revert this, though for the time being I'm just assuming we'll either manually reconcile data or accept some loss of download counts if we need to revert. For some reason Diesel CLI isn't picking up the foreign key on the partitioned table, so I regenerated the patch file Deployment Instructions ======================= This commit contains a destructive migration, and should be deployed on its own
This adds a new background job which is responsible for creating new partitions for `version_downloads` 1 year in advance. The intent here is to give an extremely large window to notice if for some reason this job stops running. In the extremely unlikely event that a full year passes, data will start going in the default partition and we will get paged. Recovering from this will be a massive pain, but I've tried to ensure it'll never happen.
r? @jtgeibel (rust_highfive has picked a reviewer for you, use r? to override) |
Query PerformanceThese tests were performed on a fork of the production database, so the Refresh Recent Crate DownloadsYou cannot get a meaningful BeforeExecution Time: 24107.687 ms Query Plan
AfterExecution Time: 2859.205 ms (~10x improvement) Query Plan
Load Uncounted VersionsWe cannot partition on a boolean, so we need to add a date restriction BeforeExecution Time: 5188.726 ms Query Plan
AfterExecution Time: 141.453 ms (~40x improvement) Query Plan
Set processed = 't' on old rowsBefore~6s on average in production, based on logs (This cannot be reliably AfterNo longer executed (∞x improvement) Insert or update (e.g. the
|
@smarnach I could use your help getting the db dump fixed |
I'm looking into fixing the database dump now. |
For the time being, the easiest way to fix the database dump is to completely exclude the I plan to implement an |
I don't think explicitly listing tables to ignore is viable here since they
get created programmatically. We need to ignore any partitions
…On Sat, Feb 22, 2020, 3:07 PM Sven Marnach ***@***.***> wrote:
For the time being, the easiest way to fix the database dump is to
completely exclude the version_downloads table from the dump. I think the
dumps still provide value even without that data, and they will only be one
third of the size. (See also #2078
<#2078> for discussions
about the future of the dumps.)
I plan to implement an ignore_tables setting in the configuration file of
the dumps. Maybe I manage to do this on a long-distance flight tomorrow.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2203?email_source=notifications&email_token=AALVMKY7YM7HUAX4JOOLJOTREGV4PA5CNFSM4KYCR47KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMVMW5Q#issuecomment-590007158>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALVMK5LR7RSAQRXTOJXE6DREGV4PANCNFSM4KYCR47A>
.
|
I filed a proposed fix as pull request against this branch. |
Pinging @kornelski on this PR with respect to changes (temporary and long term) to how we dump the |
I think it's a good idea to remove the full version_downloads from the data dump — it's a huge csv, and redownloading the full history every day makes bandwidth usage exponential. For me an ideal solution would be to export downloads in separate archives, one per day, containing only downloads from that day:
This way I could get downloads incrementally and catch up if necessary. |
@kornelski That approach would be relatively straight-forward to implement, so I may do that. Give it a few weeks, though. :-/ For the time being, would you prefer to have |
For now I'd prefer to have it included. It's OK if it's truncated. |
@kornelski Since you are the only user of the database dumps I'm in contact with, I updated sgrif#1 to include version_downloads again. I hope to implent a better solution in the next few weeks. |
@sgrif I've gone through the commits and the approach (and results) look great to me! I'd just like to take another look at the SQL function to make sure I understand it. Other than that, there's just the database export. From what I understand @smarnach has changes in sgrif#1 that should continue to export all download history for now. If you're fine with that PR then I think we can coordinate a time window to merge this and sequence through the deploy. |
I didn't get a chance to look today but I will Thursday |
Have you considered |
☔ The latest upstream changes (presumably #2539) made this pull request unmergeable. Please resolve the merge conflicts. |
CI is failing, we seem to have some conflicts and there generally hasn't been much movement here, so I'll close this PR for now. feel free to reopen if you want to pick this back up :) |
Please review the individual commits of this pull request, as this includes a binary which will be run once on production, but was then deleted in a later commit.
This attempts to address some performance issues related to the
version_downloads
table. This table has some interesting characteristics to it:While we don't care about the runtime of the queries in
update_downloads
, those queries are falling back to a seq scan of the entire table, and I believe they are evicting the entire table cache when they do that. This is something that's really hard to measure directly without production write traffic, so all we have to go on is runtime of the problem queries. Stats on that are at the bottom of this PR description. I do believe this will improve our table cache hit rate by reducing cache churn, as we will be loading much less data at once.Partitioning a table is the act of splitting it into multiple smaller tables on some known axis (in this case date). Generally we can just pretend the table is a single table (with some exceptions), but under the hood the query planner is able to prune any partitions which couldn't contain results for the query. When partitioning is applicable, it can be cheaper than traversing an index, since it affects the physical layout of the data as opposed to having a btree that can be used to quickly prune data. In particular, indexes stop being helpful above a certain data size, while partitioning does not.
I had hoped that a BRiN index would have similar improvements, since our write patterns should be perfect for it. However this just hasn't been true in practice, and the index is barely pruning any data. I believe a full vacuum would correct this, but that would require unacceptable levels of downtime.
Typically when partitioning you'd target a partition size much larger than what we're currently targeting. This is because I've optimized our partition range based on what our slowest query (refreshing recent downloads) will access, rather than targeting a row count. For older data I grouped things into larger partitions because the row count was just so low.
As of PG 12, partitions do need to be created manually. I've done this by adding a new background job which will check if a partition for 1 year in the future exists, and creates it if not. To ensure we don't start losing download data if somehow this goes a full year without running, I've also created a default partition. If any data ever ends up in this partition, it will page.
Since this changes the physical layout of the data, we can't just do this to an existing table. We do the same process we would do with any destructive action that would lock the table for an unreasonable amount of time:
We can't partition on a boolean, which means we need to add a date range to any queries filtering on the
processed
column to ensure we get a sufficiently small data set. The only reason this column seemed to exist was soupdate-downloads
could go an infinite amount of time without running and then update however far back it needs to. This just isn't realistic in practice anymore, so I've set it to only look at 1 week back. This made theprocessed
column pointless so I just removed it from the new table.All of these partitions do break some workflows, however. For one, they're included in
schema.rs
by default. I've manually excluded the ones included in this migration, but this puts us in a situation where the production schema won't be fully reflected byschema.rs
(and we probably don't want it to). I'm not sure what we want to do here, since some folks might run the job to create new partitions to test it. I could change the migration to only create the default partition in any env other than production, and have the job also exit early, and folks who want to test those locally are responsible for cleaning up after themselves; We only need the default partition in dev. I could also add functionality to Diesel to either exclude a regex or exclude all partitions (though the latter would require switching to raw SQL in the one place we directly access a partition).Additionally, the db dump is currently broken. It wants permissions for all the partitions individually (which we don't want to dump), and the method it's using for copy doesn't work with partitioned tables. I haven't looked into either of these yet, they felt tangential to what is already a beefy PR. Since we did say these are experimental, and we were willing to disable them if they broke, technically we can move forward with this before fixing it, but I don't think this is urgent enough to do that.