Deleting values and running GC doesn't reclaim space #767

magik6k · 2019-04-12T15:48:26Z

I was trying to get Badger GC in go-ipfs to reclaim space, but it didn't seem to work, so I've wrote this rather basic test case to see if it works in the simple case of keys being added, then deleted, and GC ran:

https://gist.github.com/magik6k/8c379cc02b443495e4809170fb8803a9

EDIT: gist / results updated as I discovered that I was calling Delete on wrong data...

These are my (reproducible) results:
DiscardRatio=0.5

non-deletable put: 101 MB(101 MB); sst: 0 B; vlog: 101 MB
data put: 3.1 GB(3.1 GB); sst: 0 B; vlog: 3.1 GB
put closed: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB
del-open: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB
after del: 3.2 GB(3.2 GB); sst: 20 MB; vlog: 3.1 GB
gc-step: 3.2 GB(3.2 GB); sst: 20 MB; vlog: 3.1 GB
gc: 3.2 GB(3.2 GB); sst: 20 MB; vlog: 3.1 GB
close-gc: 3.2 GB(3.2 GB); sst: 36 MB; vlog: 3.1 GB // expected ~100-120M

Cases below are wrong because of a bug in my code:

~~DiscardRatio=0.01~~

non-deletable put: 101 MB(101 MB); sst: 0 B; vlog: 101 MB //put 100M, looks good data put: 3.1 GB(3.1 GB); sst: 0 B; vlog: 3.1 GB // put 3G, looks good put closed: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB // close DB, looks good del-open: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB // reopen db, looks good after del: 9.1 GB(9.1 GB); sst: 3.0 GB; vlog: 6.1 GB // delete the 3G keys put before, sst explodes, vlog grows by 3G gc-step: 9.1 GB(9.1 GB); sst: 3.0 GB; vlog: 6.1 GB // call RunValueLogGC, nothing changes gc: 9.1 GB(9.1 GB); sst: 3.0 GB; vlog: 6.1 GB // ErrNoRewrite, even with 3 mostly rewritable vlogs close-gc: 9.2 GB(9.2 GB); sst: 3.0 GB; vlog: 6.1 GB // closing doesn't change anything

DiscardRatio=0.5

non-deletable put: 101 MB(101 MB); sst: 0 B; vlog: 101 MB data put: 3.1 GB(3.1 GB); sst: 0 B; vlog: 3.1 GB put closed: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB del-open: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB after del: 9.1 GB(9.1 GB); sst: 3.0 GB; vlog: 6.1 GB gc-step: 8.0 GB(8.0 GB); sst: 3.0 GB; vlog: 5.0 GB gc-step: 7.0 GB(7.0 GB); sst: 3.0 GB; vlog: 4.0 GB gc-step: 7.0 GB(7.0 GB); sst: 3.0 GB; vlog: 4.0 GB gc: 7.0 GB(7.0 GB); sst: 3.0 GB; vlog: 4.0 GB close-gc: 7.0 GB(7.0 GB); sst: 3.0 GB; vlog: 4.0 GB // definitely more than 100M

DiscardRatio=0.9

non-deletable put: 101 MB(101 MB); sst: 0 B; vlog: 101 MB data put: 3.1 GB(3.1 GB); sst: 0 B; vlog: 3.1 GB put closed: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB del-open: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB after del: 9.1 GB(9.1 GB); sst: 3.0 GB; vlog: 6.1 GB gc-step: 8.0 GB(8.0 GB); sst: 3.0 GB; vlog: 5.0 GB gc-step: 8.0 GB(8.0 GB); sst: 3.0 GB; vlog: 5.0 GB gc: 8.0 GB(8.0 GB); sst: 3.0 GB; vlog: 5.0 GB close-gc: 8.1 GB(8.1 GB); sst: 3.0 GB; vlog: 5.0 GB ```</s>

The text was updated successfully, but these errors were encountered:

jarifibrahim · 2019-04-16T07:45:39Z

@magik6k Thanks for reporting this. I ran your test script and it looks like the GC didn't work (even with 0.01 discard ratio). Let me dig deeper and get back.

jarifibrahim · 2019-04-23T08:58:32Z

@magik6k Looks like we have a bug in the WriteBatch API due to which the GC cannot reclaim space. However, if you inserts entries using the txn API the GC should work. I made the following changes to your script and garbage collection was able to reclaim 2 GB of space.

diff --git a/main_test.go b/main_test.go
index 0d4099b..8759f9b 100644
--- a/main_test.go
+++ b/main_test.go
@@ -13,6 +13,7 @@ import (
 
 	"github.com/dustin/go-humanize"
 	ds "github.com/ipfs/go-datastore"
+	"github.com/stretchr/testify/require"
 
 	"github.com/dgraph-io/badger"
 )
@@ -44,25 +45,29 @@ func TestGc(t *testing.T) {
 
 	r := rand.New(rand.NewSource(555))
 
-	wb := db.NewWriteBatch()
+	txn := db.NewTransaction(true)
 
 	for i := 0; i < preC; i++ { // put non-deletable entries
 		b, err := ioutil.ReadAll(io.LimitReader(r, entryS))
 		if err != nil {
 			t.Fatal(err)
 		}
-		if err := wb.Set(ds.RandomKey().Bytes(), b, 0); err != nil {
+		if err := txn.Set(ds.RandomKey().Bytes(), b); err != nil {
 			t.Fatal(err)
 		}
+		if int64(i)%1000 == 0 {
+			require.NoError(t, txn.Commit())
+			txn = db.NewTransaction(true)
+		}
 	}
 
-	if err := wb.Flush(); err != nil {
+	if err := txn.Commit(); err != nil {
 		t.Fatal(err)
 	}
 
 	pds(t, "non-deletable put")
 
-	wb = db.NewWriteBatch()
+	txn = db.NewTransaction(true)
 	es := make([][]byte, entryC)
 	for i := 0; i < entryC; i++ { // put deletable entries
 		b, err := ioutil.ReadAll(io.LimitReader(r, entryS))
@@ -70,12 +75,19 @@ func TestGc(t *testing.T) {
 			t.Fatal(err)
 		}
 		es[i] = ds.RandomKey().Bytes()
-		if err := wb.Set(es[i], b, 0); err != nil {
+		if err := txn.Set(es[i], b); err != nil {
 			t.Fatal(err)
 		}
+
+		if int64(i)%1000 == 0 {
+			if err := txn.Commit(); err != nil {
+				t.Fatal(err)
+			}
+			txn = db.NewTransaction(true)
+		}
 	}
 
-	if err := wb.Flush(); err != nil {
+	if err := txn.Commit(); err != nil {
 		t.Fatal(err)
 	}
 
@@ -94,13 +106,24 @@ func TestGc(t *testing.T) {
 
 	pds(t, "del-open")
 
-	wb = db.NewWriteBatch()
-	for _, e := range es {
-		if err := wb.Delete(e); err != nil {
+	txn = db.NewTransaction(true)
+	for i, e := range es {
+		if err := txn.Delete(e); err != nil {
 			t.Fatal(err)
 		}
+		if int64(i)%1000 == 0 {
+			if err := txn.Commit(); err != nil {
+				t.Fatal(err)
+			}
+			txn = db.NewTransaction(true)
+		}
+	}
+	if err := txn.Commit(); err != nil {
+		t.Fatal(err)
 	}
-	if err := wb.Flush(); err != nil {
+	db.Close()
+	db, err = badger.Open(opts)
+	if err != nil {
 		t.Fatal(err)
 	}

NOTE - It is important that the db is closed and reopened. We perform compaction when the DB is closed. Without compaction the GC wouldn't be able to free up the space. Compaction happens automatically but in this case since the data isn't enough for compaction to be triggered, we force compaction by closing the DB.

This is what I get on running the script above

$ go test -v github.com/jarifibrahim/foo -run TestGc
=== RUN   TestGc
non-deletable put: 101 MB(101 MB); sst: 0 B; vlog: 101 MB
data put: 3.1 GB(3.1 GB); sst: 0 B; vlog: 3.1 GB
put closed: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB
del-open: 3.1 GB(3.1 GB); sst: 20 MB; vlog: 3.1 GB
after del: 3.2 GB(3.2 GB); sst: 17 MB; vlog: 3.1 GB
gc-step: 2.2 GB(2.2 GB); sst: 17 MB; vlog: 2.2 GB
gc-step: 1.1 GB(1.1 GB); sst: 17 MB; vlog: 1.1 GB
gc-step: 1.1 GB(1.1 GB); sst: 17 MB; vlog: 1.1 GB
gc: 1.1 GB(1.1 GB); sst: 17 MB; vlog: 1.1 GB
close-gc: 1.1 GB(1.1 GB); sst: 17 MB; vlog: 1.1 GB
--- PASS: TestGc (38.71s)
PASS


$ du -lh /tmp/badger/*
1.1G	/tmp/badger/000002.vlog
7.6M	/tmp/badger/000003.vlog
17M	/tmp/badger/000006.sst
4.0K	/tmp/badger/MANIFEST

The 00002.vlog files wasn't removed because it might not have enough entries to discard (we have set the discard ratio to 0.5). The other file 00003.vlog was not removed because it is the latest in-use vlog file (it has the head marker). We never remove vlog file with the latest head marker.

magik6k · 2019-04-23T09:59:15Z

Is there a way to trigger compaction without closing the database?

jarifibrahim · 2019-04-23T12:53:28Z

Try running Flatten

badger/db.go

Lines 1185 to 1191 in 1fcc96e

    
           // Flatten can be used to force compactions on the LSM tree so all the tables fall on the same 
        
           // level. This ensures that all the versions of keys are colocated and not split across multiple 
        
           // levels, which is necessary after a restore from backup. During Flatten, live compactions are 
        
           // stopped. Ideally, no writes are going on during Flatten. Otherwise, it would create competition 
        
           // between flattening the tree and new tables being created at level zero. 
        
           func (db *DB) Flatten(workers int) error {

Every Transaction stores the latest value of readTs it is aware of. When the transaction is discarded (which happens even when we commit), the global value of readTs is updated. Previously, the readTs of transaction inside the write batch struct was set to 0. So the global value of readTs would also be set to 0 (unless someone ran a transaction after using write batch). Due to the 0 value of the global readTs, the compaction algorithm would skip all the values. With this commit, the compaction algorithm works fine with key-values inserted via Transaction API or via the Write Batch API. See https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/levels.go#L480-L484 and dgraph-io#767

Every Transaction stores the latest value of `readTs` it is aware of. When the transaction is discarded (which happens even when we commit), the global value of `readMark` is updated. https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L501-L503 Previously, the `readTs` of transaction inside the write batch struct was set to 0. So the global value of `readMark` would also be set to 0 (unless someone ran a transaction after using write batch). Due to the 0 value of the global `readMark`, the compaction algorithm would skip all the values which were inserted in the write batch call. https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/levels.go#L480-L484 and https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L138-L145 The `o.readMark.DoneUntil()` call would always return `0` and so the compaction wouldn't compact the newer values. With this commit, the compaction algorithm works as expected with key-values inserted via Transaction API or via the Write Batch API. See dgraph-io#767

…er versions of keys during compactions. Every Transaction stores the latest value of `readTs` it is aware of. When the transaction is discarded (which happens even when we commit), the global value of `readMark` is updated. https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L501-L503 Previously, the `readTs` of transaction inside the write batch struct was set to 0. So the global value of `readMark` would also be set to 0 (unless someone ran a transaction after using write batch). Due to the 0 value of the global `readMark`, the compaction algorithm would skip all the values which were inserted in the write batch call. See https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/levels.go#L480-L484 and https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L138-L145 The `o.readMark.DoneUntil()` call would always return `0` and so the compaction wouldn't compact the newer values. With this commit, the compaction algorithm works as expected with key-values inserted via Transaction API or via the Write Batch API. See dgraph-io#767

…d older versions of keys during compactions. Every Transaction stores the latest value of `readTs` it is aware of. When the transaction is discarded (which happens even when we commit), the global value of `readMark` is updated. https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L501-L503 Previously, the `readTs` of transaction inside the write batch struct was set to 0. So the global value of `readMark` would also be set to 0 (unless someone ran a transaction after using write batch). Due to the 0 value of the global `readMark`, the compaction algorithm would skip all the values which were inserted in the write batch call. See https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/levels.go#L480-L484 and https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L138-L145 The `o.readMark.DoneUntil()` call would always return `0` and so the compaction wouldn't compact the newer values. With this commit, the compaction algorithm works as expected with key-values inserted via Transaction API or via the Write Batch API. See dgraph-io#767

…d older versions of keys during compactions. Every Transaction stores the latest value of `readTs` it is aware of. When the transaction is discarded (which happens even when we commit), the global value of `readMark` is updated. https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L501-L503 Previously, the `readTs` of transaction inside the write batch struct was set to 0. So the global value of `readMark` would also be set to 0 (unless someone ran a transaction after using write batch). Due to the 0 value of the global `readMark`, the compaction algorithm would skip all the values which were inserted in the write batch call. See https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/levels.go#L480-L484 and https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L138-L145 The `o.readMark.DoneUntil()` call would always return `0` and so the compaction wouldn't compact the newer values. With this commit, the compaction algorithm works as expected with key-values inserted via Transaction API or via the Write Batch API. See #767

jarifibrahim · 2019-05-02T14:38:40Z

@magik6k with #778, the script in https://gist.github.com/magik6k/8c379cc02b443495e4809170fb8803a9 would still produce similar results but the values would be removed eventually. The GC would reclaim space after some time.

linxGnu · 2019-05-29T03:12:15Z

Hello @jarifibrahim

I have the same problem and got disk leak while trying to delete keys/values using batch.

The commit: d98dd68#diff-42ea5667b327bb207485077410d5f499
revert your previous commit and problem still remains.

How about reopen this issue?

jarifibrahim · 2019-05-29T16:27:50Z

@linxGnu The value log GC isn't supposed to reclaim space immediately. The change in #778 was reverted because we had issues with it.

The issue here isn't with the GC, it's with the Write Batch API. You need not worry about GC. It will eventually clean up the space. There are multiple factors involved when it tries to find a vlog file to clean.

Take a look at the following script. It works perfectly fine https://gist.github.com/jarifibrahim/78621293e68dffbc30be860f3c9df549#file-main_test-go and it's output https://gist.github.com/jarifibrahim/78621293e68dffbc30be860f3c9df549#file-output-txt.

I am not sure if this is an actual bug. I mean the GC did reclaim space. It just didn't do it immediately.

linxGnu · 2019-05-30T02:33:25Z

@jarifibrahim Thank you very much for your details. I would take a look again and report if I still seeing disk not reclaim when using Write Batch API 👍

jarifibrahim · 2019-05-30T06:14:13Z

@linxGnu Just to help you understand how GC works --

We store Discard Stats which contains how much data can be discarded in each vlog file. The discard stats are built when compactions happen. (You can force compaction by closing and opening the DB)
In case the discard stats does not have information about a specific file, we perform sampling on the file and based on the sample we decide if it should be cleaned up.

Value Log GC is supposed to clean up space eventually. There might be cases when GC doesn't clean up the data, but it will be cleaned up eventually.

Reading badger DB issues list, yielded the following. RunValueLogGC() does clean up online. But on small databases (150MB) is not big enough, the only way to update stats for GC is to close DB. see Note in dgraph-io/badger#767 (comment) Based on this information the logic is redone, to call Close only if RunValueLogGC did not succeed.

…d older versions of keys during compactions. Every Transaction stores the latest value of `readTs` it is aware of. When the transaction is discarded (which happens even when we commit), the global value of `readMark` is updated. https://github.com/dgraph-io/badger/blob/ef05d3439792607477618c0164d9a6e977f43a63/txn.go#L501-L503 Previously, the `readTs` of transaction inside the write batch struct was set to 0. So the global value of `readMark` would also be set to 0 (unless someone ran a transaction after using write batch). Due to the 0 value of the global `readMark`, the compaction algorithm would skip all the values which were inserted in the write batch call. See https://github.com/dgraph-io/badger/blob/ef05d3439792607477618c0164d9a6e977f43a63/levels.go#L480-L484 and https://github.com/dgraph-io/badger/blob/ef05d3439792607477618c0164d9a6e977f43a63/txn.go#L138-L145 The `o.readMark.DoneUntil()` call would always return `0` and so the compaction wouldn't compact the newer values. With this commit, the compaction algorithm works as expected with key-values inserted via Transaction API or via the Write Batch API. See dgraph-io/badger#767

ashish314 · 2023-06-01T20:58:51Z

Hello @jarifibrahim,

I have a couple of questions regarding BadgerDB's garbage collection and file deletion process:

Does BadgerDB delete files automatically, or do users need to call RunValueLogGC at intervals to delete discarded files after compaction?

Is there any way we can obtain information about which files will be deleted during the process of RunValueLogGC before they are actually deleted? This would be helpful if we want to store this data to a cheaper storage solution for backup or archiving purposes.

I would appreciate any insights or guidance you can provide on these topics. Thank you!

mangalaman93 · 2023-06-10T06:17:23Z

Hi @ashish314,

If you want to take a backup of data in badger, you should use the Backup APIs in badger. You can see more details here https://dgraph.io/docs/badger/get-started/#database-backup/

I am not sure if you need to call RunValueLogGC periodically, it seems to me that you should.

jarifibrahim · 2023-06-10T06:22:12Z

HI @ashish314!

Does BadgerDB delete files automatically, or do users need to call RunValueLogGC at intervals to delete discarded files after compaction?

Badger will perform cleanup automatically. This means it will delete old data and files automatically.

Is there any way we can obtain information about which files will be deleted during the process of RunValueLogGC before they are actually deleted? This would be helpful if we want to store this data to a cheaper storage solution for backup or archiving purposes.

We don't expose this information but you shouldn't need to worry about this data. GC removes only deleted/expired/duplicate/stale data. All the useful data is kept as it is.

You can take periodic backups of your data if you'd like using the backup API.

jarifibrahim self-assigned this Apr 16, 2019

zjshen14 mentioned this issue Apr 17, 2019

OOM when writing to badger #772

Closed

magik6k mentioned this issue Apr 19, 2019

GC: Data may remain after garbage collecting everything ipfs/go-ds-badger#54

Open

jarifibrahim mentioned this issue Apr 23, 2019

Set write batch readTs to the readMark.DoneUntil() value #778

Merged

jarifibrahim closed this as completed in #778 May 2, 2019

hpucha mentioned this issue Dec 12, 2019

Flatten leaves behind 2 or more versions #1156

Closed

evgmik mentioned this issue Feb 16, 2021

Feature/badger db v3 sahib/brig#96

Merged

ostafen mentioned this issue Aug 9, 2022

V2: RunValueLogGC(): Value log GC attempt didn't result in any cleanup ostafen/clover#94

Closed

werbenhu mentioned this issue Mar 28, 2024

Replace badgerhold with directly using BadgerDB v4 mochi-mqtt/server#376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleting values and running GC doesn't reclaim space #767

Deleting values and running GC doesn't reclaim space #767

magik6k commented Apr 12, 2019 •

edited

Loading

jarifibrahim commented Apr 16, 2019

jarifibrahim commented Apr 23, 2019

magik6k commented Apr 23, 2019

jarifibrahim commented Apr 23, 2019

jarifibrahim commented May 2, 2019

linxGnu commented May 29, 2019

jarifibrahim commented May 29, 2019 •

edited

Loading

linxGnu commented May 30, 2019

jarifibrahim commented May 30, 2019

ashish314 commented Jun 1, 2023

mangalaman93 commented Jun 10, 2023

jarifibrahim commented Jun 10, 2023

Deleting values and running GC doesn't reclaim space #767

Deleting values and running GC doesn't reclaim space #767

Comments

magik6k commented Apr 12, 2019 • edited Loading

jarifibrahim commented Apr 16, 2019

jarifibrahim commented Apr 23, 2019

magik6k commented Apr 23, 2019

jarifibrahim commented Apr 23, 2019

jarifibrahim commented May 2, 2019

linxGnu commented May 29, 2019

jarifibrahim commented May 29, 2019 • edited Loading

linxGnu commented May 30, 2019

jarifibrahim commented May 30, 2019

ashish314 commented Jun 1, 2023

mangalaman93 commented Jun 10, 2023

jarifibrahim commented Jun 10, 2023

magik6k commented Apr 12, 2019 •

edited

Loading

jarifibrahim commented May 29, 2019 •

edited

Loading