Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database disk usage increase indefinitely #1609

Closed
thaarok opened this issue Mar 30, 2022 · 4 comments
Closed

Database disk usage increase indefinitely #1609

thaarok opened this issue Mar 30, 2022 · 4 comments

Comments

@thaarok
Copy link

thaarok commented Mar 30, 2022

In our application we use Pebble to store around 3 TB of data. When the process runs long-term, the database consume more and more disk space (the directory with Pebble database), until it runs out all disk space (around 6 TB of space).
However when we restart the application and it reopens the Pebble database, the the database reduce its size back to 3 TB in one minute.

We are able to keep the Pebble database in reasonable size if we restart the application regularly, but its size increase indefinitely otherwise.

Is there some clean-up when the database is being opened?
Can there be some reason why it cannot run while the database is used?

Thanks!

We use version from the master branch, commit f9d4a33.

Here are db.Metrics() after 10 days of run, when the database directory consume 5.8 TB of disk space (and keeps growing):

__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
    WAL         1   2.0 M       -   209 G       -       -       -       -   209 G       -       -       -     1.0
      0         2   1.3 M    0.50   209 G     0 B       0     0 B       0   185 G   252 K     0 B       1     0.9
      1        20    61 M    0.95   185 G     0 B       0     0 B       0   2.1 T   685 K   2.1 T       1    11.7
      2        85   547 M    1.00   179 G     0 B       0   473 M   1.1 K   1.1 T   192 K   1.1 T       1     6.4
      3       373   4.6 G    1.00   177 G     0 B       0   219 M     347   1.1 T    93 K   1.1 T       1     6.3
      4      1608    39 G    1.00   175 G     0 B       0   344 M     179   1.1 T    45 K   1.1 T       1     6.1
      5      7235   337 G    1.00   172 G     0 B       0   1.4 G     168   1.0 T    22 K   1.0 T       1     6.1
      6     25787   2.8 T       -   158 G     0 B       0     0 B       0   887 G   8.3 K   888 G       1     5.6
  total     35110   3.1 T       -   209 G     0 B       0   2.4 G   1.8 K   7.7 T   1.3 M   7.3 T       7    37.5
  flush     64002
compact    216803    62 M   200 K       0          (size == estimated-debt, score = in-progress-bytes, in = num-in-progress)
  ctype    210190       0       0    1811    4802       0  (default, delete, elision, move, read, rewrite)
 memtbl         1   4.0 M
zmemtbl      1134   4.4 G
   ztbl    152164   2.5 T
 bcache         0     0 B    2.9%  (score == hit-rate)
 tcache     1.9 K   1.2 M   96.2%  (score == hit-rate)
  snaps         1       - 15665606397  (score == earliest seq num)
 titers      8346
 filter         -       -    0.0%  (score == utility)

The Pebble configuration should match the default:

db, err := pebble.Open(path, &pebble.Options{
	BytesPerSync:                512 << 10, // SSTable syncs (512 KB)
	Cache:                       pebble.NewCache(8 << 20), // 8 MB
	L0CompactionThreshold:       4, // default: 4
	L0StopWritesThreshold:       12, // default: 12
	LBaseMaxBytes:               64 << 20, // default: 64 MB
	MaxManifestFileSize:         128 << 20, // default: 128 MB
	MaxOpenFiles:                1000,
	MemTableSize:                4 << 20, // default: 4 MB
	MemTableStopWritesThreshold: 2, // writes are stopped when sum of the queued memtable sizes exceeds
	MaxConcurrentCompactions:    1,
	NumPrevManifest:             1, // keep one old manifest
	WALBytesPerSync:             0, // default 0 (matches RocksDB)
})
@petermattis
Copy link
Collaborator

ztbl 152164 2.5 T

This line is pointing to the root cause. ztbl refers to "zombie tables" which are sstables that are no longer used by the LSM, but prevented from being deleted by open iterators that have not been closed. Audit your code to see if there are unclosed iterators. I believe if you close your DB an error will be generated if there are unclosed iterators. And there is some debugging code for locating this iterators. I don't recall off-hand how to enable that debugging code.

@thaarok
Copy link
Author

thaarok commented Mar 30, 2022

@petermattis Thank you! I will check it.
Can an unclosed snapshot have the same effect?

@jbowens
Copy link
Collaborator

jbowens commented Mar 30, 2022

@hkalina An unclosed snapshot alone would not increase the ztbl size. However an unclosed iterator created from an open snapshot would. When using a long-lived snapshot, it's important to make sure the iterators that read from the snapshot are still short-lived, opening and closing them as necessary.

An unclosed snapshot alone will prevent disk space reclamation by deletes, but it wouldn't manifest as an increase in ztbl "zombie tables". It also only prevents the reclamation of space, whereas an open iterator in the presence of compactions can increase disk usage: Compactions produce new sstables, but the open iterator requires keeping both the old and new sstables around, doubling the disk usage for the same data.

@thaarok
Copy link
Author

thaarok commented Mar 31, 2022

I have added monitoring of opened iterators into our code and it seems it was caused by processing of very time intensive requests iterating the database - we have added context.WithTimeout and we dont see long-living iterators anymore.
Thank you for help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants