Database disk usage increase indefinitely #1609

thaarok · 2022-03-30T17:26:34Z

In our application we use Pebble to store around 3 TB of data. When the process runs long-term, the database consume more and more disk space (the directory with Pebble database), until it runs out all disk space (around 6 TB of space).
However when we restart the application and it reopens the Pebble database, the the database reduce its size back to 3 TB in one minute.

We are able to keep the Pebble database in reasonable size if we restart the application regularly, but its size increase indefinitely otherwise.

Is there some clean-up when the database is being opened?
Can there be some reason why it cannot run while the database is used?

Thanks!

We use version from the master branch, commit f9d4a33.

Here are db.Metrics() after 10 days of run, when the database directory consume 5.8 TB of disk space (and keeps growing):

__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
    WAL         1   2.0 M       -   209 G       -       -       -       -   209 G       -       -       -     1.0
      0         2   1.3 M    0.50   209 G     0 B       0     0 B       0   185 G   252 K     0 B       1     0.9
      1        20    61 M    0.95   185 G     0 B       0     0 B       0   2.1 T   685 K   2.1 T       1    11.7
      2        85   547 M    1.00   179 G     0 B       0   473 M   1.1 K   1.1 T   192 K   1.1 T       1     6.4
      3       373   4.6 G    1.00   177 G     0 B       0   219 M     347   1.1 T    93 K   1.1 T       1     6.3
      4      1608    39 G    1.00   175 G     0 B       0   344 M     179   1.1 T    45 K   1.1 T       1     6.1
      5      7235   337 G    1.00   172 G     0 B       0   1.4 G     168   1.0 T    22 K   1.0 T       1     6.1
      6     25787   2.8 T       -   158 G     0 B       0     0 B       0   887 G   8.3 K   888 G       1     5.6
  total     35110   3.1 T       -   209 G     0 B       0   2.4 G   1.8 K   7.7 T   1.3 M   7.3 T       7    37.5
  flush     64002
compact    216803    62 M   200 K       0          (size == estimated-debt, score = in-progress-bytes, in = num-in-progress)
  ctype    210190       0       0    1811    4802       0  (default, delete, elision, move, read, rewrite)
 memtbl         1   4.0 M
zmemtbl      1134   4.4 G
   ztbl    152164   2.5 T
 bcache         0     0 B    2.9%  (score == hit-rate)
 tcache     1.9 K   1.2 M   96.2%  (score == hit-rate)
  snaps         1       - 15665606397  (score == earliest seq num)
 titers      8346
 filter         -       -    0.0%  (score == utility)

The Pebble configuration should match the default:

db, err := pebble.Open(path, &pebble.Options{
	BytesPerSync:                512 << 10, // SSTable syncs (512 KB)
	Cache:                       pebble.NewCache(8 << 20), // 8 MB
	L0CompactionThreshold:       4, // default: 4
	L0StopWritesThreshold:       12, // default: 12
	LBaseMaxBytes:               64 << 20, // default: 64 MB
	MaxManifestFileSize:         128 << 20, // default: 128 MB
	MaxOpenFiles:                1000,
	MemTableSize:                4 << 20, // default: 4 MB
	MemTableStopWritesThreshold: 2, // writes are stopped when sum of the queued memtable sizes exceeds
	MaxConcurrentCompactions:    1,
	NumPrevManifest:             1, // keep one old manifest
	WALBytesPerSync:             0, // default 0 (matches RocksDB)
})

The text was updated successfully, but these errors were encountered:

petermattis · 2022-03-30T17:33:27Z

ztbl 152164 2.5 T

This line is pointing to the root cause. ztbl refers to "zombie tables" which are sstables that are no longer used by the LSM, but prevented from being deleted by open iterators that have not been closed. Audit your code to see if there are unclosed iterators. I believe if you close your DB an error will be generated if there are unclosed iterators. And there is some debugging code for locating this iterators. I don't recall off-hand how to enable that debugging code.

thaarok · 2022-03-30T18:31:06Z

@petermattis Thank you! I will check it.
Can an unclosed snapshot have the same effect?

jbowens · 2022-03-30T20:29:14Z

@hkalina An unclosed snapshot alone would not increase the ztbl size. However an unclosed iterator created from an open snapshot would. When using a long-lived snapshot, it's important to make sure the iterators that read from the snapshot are still short-lived, opening and closing them as necessary.

An unclosed snapshot alone will prevent disk space reclamation by deletes, but it wouldn't manifest as an increase in ztbl "zombie tables". It also only prevents the reclamation of space, whereas an open iterator in the presence of compactions can increase disk usage: Compactions produce new sstables, but the open iterator requires keeping both the old and new sstables around, doubling the disk usage for the same data.

thaarok · 2022-03-31T12:56:34Z

I have added monitoring of opened iterators into our code and it seems it was caused by processing of very time intensive requests iterating the database - we have added context.WithTimeout and we dont see long-living iterators anymore.
Thank you for help!

nicktrav mentioned this issue Mar 30, 2022

db: add custom profile for open iterators #1597

Closed

thaarok closed this as completed Mar 31, 2022

baabeetaa mentioned this issue Aug 14, 2022

check size on disk growing for all chains notional-labs/cosmosia#94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database disk usage increase indefinitely #1609

Database disk usage increase indefinitely #1609

thaarok commented Mar 30, 2022 •

edited

Loading

petermattis commented Mar 30, 2022

thaarok commented Mar 30, 2022

jbowens commented Mar 30, 2022

thaarok commented Mar 31, 2022

Database disk usage increase indefinitely #1609

Database disk usage increase indefinitely #1609

Comments

thaarok commented Mar 30, 2022 • edited Loading

petermattis commented Mar 30, 2022

thaarok commented Mar 30, 2022

jbowens commented Mar 30, 2022

thaarok commented Mar 31, 2022

thaarok commented Mar 30, 2022 •

edited

Loading