CouchDB data removal #3458
-
We have CouchDB setup to store sensor data and view reports via dashboard. Currently one DB is over 2TB in size and we need to get rid of old data. We couldn’t find a way to delete data and free up the space. We though of filter replicate to a new DB and delete the old. We have a requirement of keeping past 6 months’ worth of data for viewing at a later day. Option 1: Option 2: What would be the best option considering your experience with similar approaches. Will there be better approach than this? What are the pros and cons. TIA. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Just to understand better your situation, did you already delete the old datas and just need to get rid of the tombstones, or all the datas are still alive in the DB? |
Beta Was this translation helpful? Give feedback.
-
Would you be able to query data across multiple databases or do you have to have only a single primary database for the clients to talk to? If you can query multiple dbs, you could create a new db every month, then your query would have to find the range of dbs (this assumes time is part of every query), query them and aggregate the results in the application layer. And then to get rid of old data, you could simply delete dbs that are older than 6 months. To handle the switchover, if some clients might be updating the old dbs for a bit, you could start a filtered replication from previous dbs to ensure you bring any of those updates to the new db if they belong there. Depending exactly how dbs get created and where the timestamps get inserted you may not need this last part. If you need to have only a single db holding at least 6 months of data, and querying multiple dbs is not going to work, then it seems
After the 7th month and before the 12th (say on the 11th) , create a new db (second row) then start a filtered replication with Some time before the 6th month in 2022 (say on the 5th), create the For filtered replications, you could do a continuous first, and after you switch db, and that is no long the primary, do another top-off one-shot replication, or maybe monitor that you don't have any more pending changes left. But a one-shot replication is best then I think. Once that completes you can delete the old db. |
Beta Was this translation helpful? Give feedback.
Would you be able to query data across multiple databases or do you have to have only a single primary database for the clients to talk to?
If you can query multiple dbs, you could create a new db every month, then your query would have to find the range of dbs (this assumes time is part of every query), query them and aggregate the results in the application layer. And then to get rid of old data, you could simply delete dbs that are older than 6 months. To handle the switchover, if some clients might be updating the old dbs for a bit, you could start a filtered replication from previous dbs to ensure you bring any of those updates to the new db if they belong there. Depending exactly ho…