From 918e0bf0305bef9ce2adda55aa4d23d1f3fa5c4e Mon Sep 17 00:00:00 2001 From: Max Isom Date: Wed, 10 Jul 2024 16:13:13 -0700 Subject: [PATCH 01/10] [ENH] CIP: Write-Ahead Log Pruning & Vacuuming --- ...02024_Write_Ahead_Log_Pruning_Vacuuming.md | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md new file mode 100644 index 00000000000..9bfe5011e97 --- /dev/null +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -0,0 +1,64 @@ +# CIP-07102024: Write-Ahead Log Pruning & Vacuuming + +## Status + +Current Status: `Under Discussion` + +## Motivation + +Chroma's SQLite-based write-ahead log grows infinitely over time. When ingesting large amounts of data, it's not uncommon for the SQLite database to grow to many gigabytes in size. Large databases cost more, take longer to back up, and can result in decreased query performance. + +There are two separate problems: + +- The database, specifically the `embeddings_queue` table, has unbounded growth. +- The SQLite `VACUUM` command, often recommended for such scenarios, is a blocking and potentially slow operation. Read and write operations are both blocked during a `VACUUM`.[^1] + +This CIP addresses both issues. + +## Proposed Changes + +A new configuration parameter will be added, `log:vacuum_threshold`. It defaults to 1GB. This helps avoid excessive fragmentation—without this parameter, or if it's set to `0`, it is effectively the same as SQLite's full vacuum mode. + +Two additional things will be done after write transactions: + +1. The `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. +2. `PRAGMA freelist_count` will be checked to see if the number of free pages multiplied by the page size is greater than the `log:vacuum_threshold` parameter. If so, `PRAGMA incremental_vacuum` is run to free up pages. This is a non-blocking operation. + +Tentatively this will be done after every write transaction, assuming the added latency is minimal. If this is not the case, it will be run on some interval. + +## Public Interfaces + +In addition to the configuration parameter described above, a new `chroma vacuum` command will be added to the CLI to manually perform a full vacuum of the database. Usage: + +```bash +chroma vacuum --path ./chroma_data +``` + +This automatically runs the pruning operation described above before running `VACUUM`. + +`chroma vacuum` should be run infrequently; it may increase query performance but the degree to which it does so is currently unknown. + +## Compatibility, Deprecation, and Migration Plan + +Incremental vacuuming is not available by default in SQLite, and it's a little more complicated than just flipping a setting: + +> However, changing from "none" to "full" or "incremental" can only occur when the database is new (no tables have yet been created) or by running the VACUUM command.[^2] + +This means existing installations will not benefit from auto-pruning until they run `chroma vacuum`. + +Users should see disk space freed immediately after upgrading and running `chroma vacuum` for the first time. Subsequent runs of `chroma vacuum` will likely free up no or very little disk space as the database will be continuously auto-pruned from that point forward. + +## Test Plan + +Both auto-pruning and the vacuum command should be thoroughly tested with property-based testing. For the vacuum command, we should also test with concurrent read/write load to ensure the database is not corrupted. + +## Rejected Alternatives + +**Only prune when running `chroma vacuum`**: instead of continuously pruning the `embeddings_queue` table, only prune it when running `chroma vacuum` or some other manual command. This alternative was rejected because Chroma should be able to automatically keep its database size in check without manual intervention. + +## Resources + +- [Excellent overview of different vacuuming strategies](https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/) + +[^1]: [SQLite Vacuum](https://sqlite.org/lang_vacuum.html) +[^2]: https://www.sqlite.org/pragma.html#pragma_auto_vacuum From abf5e631966e6b10ae88d5d862e421eb6ec3f06a Mon Sep 17 00:00:00 2001 From: Max Isom Date: Wed, 10 Jul 2024 16:19:50 -0700 Subject: [PATCH 02/10] Clarify unit --- docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index 9bfe5011e97..c47927c7de6 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -17,7 +17,7 @@ This CIP addresses both issues. ## Proposed Changes -A new configuration parameter will be added, `log:vacuum_threshold`. It defaults to 1GB. This helps avoid excessive fragmentation—without this parameter, or if it's set to `0`, it is effectively the same as SQLite's full vacuum mode. +A new configuration parameter will be added, `log:vacuum_threshold`. It defaults to 1GB. Following Postgres' convention, the unit is megabytes. This helps avoid excessive fragmentation—without this parameter, or if it's set to `0`, it is effectively the same as SQLite's full vacuum mode. Two additional things will be done after write transactions: From 71c811a70e84cff760aed0b9e5d641bca44b713b Mon Sep 17 00:00:00 2001 From: Max Isom Date: Thu, 11 Jul 2024 09:36:11 -0700 Subject: [PATCH 03/10] Add note about free space check --- docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index c47927c7de6..11996172496 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -34,7 +34,7 @@ In addition to the configuration parameter described above, a new `chroma vacuum chroma vacuum --path ./chroma_data ``` -This automatically runs the pruning operation described above before running `VACUUM`. +This automatically runs the pruning operation described above before running `VACUUM`. Prior to any modifications, it checks that there is enough available disk space to complete the vacuum (i.e. the free space on the disk is at least twice the size of the database).[^2] `chroma vacuum` should be run infrequently; it may increase query performance but the degree to which it does so is currently unknown. @@ -42,7 +42,7 @@ This automatically runs the pruning operation described above before running `VA Incremental vacuuming is not available by default in SQLite, and it's a little more complicated than just flipping a setting: -> However, changing from "none" to "full" or "incremental" can only occur when the database is new (no tables have yet been created) or by running the VACUUM command.[^2] +> However, changing from "none" to "full" or "incremental" can only occur when the database is new (no tables have yet been created) or by running the VACUUM command.[^3] This means existing installations will not benefit from auto-pruning until they run `chroma vacuum`. @@ -61,4 +61,5 @@ Both auto-pruning and the vacuum command should be thoroughly tested with proper - [Excellent overview of different vacuuming strategies](https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/) [^1]: [SQLite Vacuum](https://sqlite.org/lang_vacuum.html) -[^2]: https://www.sqlite.org/pragma.html#pragma_auto_vacuum +[^2]: [2.9: Transient Database Used by Vacuum](https://www.sqlite.org/tempfiles.html) +[^3]: https://www.sqlite.org/pragma.html#pragma_auto_vacuum From 82df8ee09956d10000a978eb0a95ce19476365a2 Mon Sep 17 00:00:00 2001 From: Max Isom Date: Thu, 11 Jul 2024 10:02:06 -0700 Subject: [PATCH 04/10] Vacuum should not be run while server is alive --- docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index 11996172496..c22b4545cb9 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -22,7 +22,7 @@ A new configuration parameter will be added, `log:vacuum_threshold`. It defaults Two additional things will be done after write transactions: 1. The `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. -2. `PRAGMA freelist_count` will be checked to see if the number of free pages multiplied by the page size is greater than the `log:vacuum_threshold` parameter. If so, `PRAGMA incremental_vacuum` is run to free up pages. This is a non-blocking operation. +2. `PRAGMA freelist_count` will be checked to see if the number of free pages multiplied by the page size (`PRAGMA page_size`) is greater than the `log:vacuum_threshold` parameter. If so, `PRAGMA incremental_vacuum` is run to free up pages. This is a non-blocking operation. Tentatively this will be done after every write transaction, assuming the added latency is minimal. If this is not the case, it will be run on some interval. @@ -38,6 +38,8 @@ This automatically runs the pruning operation described above before running `VA `chroma vacuum` should be run infrequently; it may increase query performance but the degree to which it does so is currently unknown. +We should clearly document that `chroma vacuum` is not intended to be run while the Chroma server is running, maybe in the form of a confirmation prompt. + ## Compatibility, Deprecation, and Migration Plan Incremental vacuuming is not available by default in SQLite, and it's a little more complicated than just flipping a setting: From ff3fd973705d21aa8f1ad74559bf7f9840008f32 Mon Sep 17 00:00:00 2001 From: Max Isom Date: Mon, 15 Jul 2024 16:55:44 -0700 Subject: [PATCH 05/10] Update test plan --- docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index c22b4545cb9..a55ae304225 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -52,7 +52,7 @@ Users should see disk space freed immediately after upgrading and running `chrom ## Test Plan -Both auto-pruning and the vacuum command should be thoroughly tested with property-based testing. For the vacuum command, we should also test with concurrent read/write load to ensure the database is not corrupted. +Auto-pruning should be thoroughly tested with property-based testing. We should test `chroma vacuum` with concurrent write operations to confirm it behaves as expected and emits the appropriate error messages. ## Rejected Alternatives From d756c85cefd1609c5d1c5b8ae7aee48963147c03 Mon Sep 17 00:00:00 2001 From: Max Isom Date: Tue, 16 Jul 2024 14:54:50 -0700 Subject: [PATCH 06/10] draft --- ...02024_Write_Ahead_Log_Pruning_Vacuuming.md | 167 ++++++++++++++++-- 1 file changed, 155 insertions(+), 12 deletions(-) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index a55ae304225..c6b0a626ec2 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -11,20 +11,30 @@ Chroma's SQLite-based write-ahead log grows infinitely over time. When ingesting There are two separate problems: - The database, specifically the `embeddings_queue` table, has unbounded growth. -- The SQLite `VACUUM` command, often recommended for such scenarios, is a blocking and potentially slow operation. Read and write operations are both blocked during a `VACUUM`.[^1] +- The SQLite `VACUUM` command, often recommended for such scenarios, takes an exclusive lock on the database and is potentially quite slow. This CIP addresses both issues. ## Proposed Changes -A new configuration parameter will be added, `log:vacuum_threshold`. It defaults to 1GB. Following Postgres' convention, the unit is megabytes. This helps avoid excessive fragmentation—without this parameter, or if it's set to `0`, it is effectively the same as SQLite's full vacuum mode. - Two additional things will be done after write transactions: -1. The `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. -2. `PRAGMA freelist_count` will be checked to see if the number of free pages multiplied by the page size (`PRAGMA page_size`) is greater than the `log:vacuum_threshold` parameter. If so, `PRAGMA incremental_vacuum` is run to free up pages. This is a non-blocking operation. +1. The `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. (As long as this is done continuously, this is a relatively cheap operation.) +2. `PRAGMA freelist_count` will be checked to see if the number of free pages multiplied by the page size is greater than the `log:vacuum_threshold` parameter. If so, `PRAGMA incremental_vacuum` is run to free up pages, up to `log:vacuum_limit`. This adds latency to write transactions, but will not block read transactions. + +### New configuration parameters + +**`log:vacuum_threshold`**: -Tentatively this will be done after every write transaction, assuming the added latency is minimal. If this is not the case, it will be run on some interval. +- Default: 1GB +- Unit: megabytes (Postgres' convention) +- Usage: this helps avoid excessive fragmentation—without this parameter, or if it's set to `0`, it is effectively the same as SQLite's full vacuum mode. + +**`log:vacuum_limit`**: + +- Default: 0 +- Unit: megabytes (Postgres' convention) +- Usage: vacuuming adds latency to write transactions. This allows rough control over the added latency by only vacuuming free pages up to this limit. If set to `0`, vacuuming will always reclaim all available space. If set to a small non-zero value, it's possible that ## Public Interfaces @@ -34,7 +44,7 @@ In addition to the configuration parameter described above, a new `chroma vacuum chroma vacuum --path ./chroma_data ``` -This automatically runs the pruning operation described above before running `VACUUM`. Prior to any modifications, it checks that there is enough available disk space to complete the vacuum (i.e. the free space on the disk is at least twice the size of the database).[^2] +This automatically runs the pruning operation described above before running `VACUUM`. Prior to any modifications, it checks that there is enough available disk space to complete the vacuum (i.e. the free space on the disk is at least twice the size of the database).[^1] `chroma vacuum` should be run infrequently; it may increase query performance but the degree to which it does so is currently unknown. @@ -44,7 +54,7 @@ We should clearly document that `chroma vacuum` is not intended to be run while Incremental vacuuming is not available by default in SQLite, and it's a little more complicated than just flipping a setting: -> However, changing from "none" to "full" or "incremental" can only occur when the database is new (no tables have yet been created) or by running the VACUUM command.[^3] +> However, changing from "none" to "full" or "incremental" can only occur when the database is new (no tables have yet been created) or by running the VACUUM command.[^2] This means existing installations will not benefit from auto-pruning until they run `chroma vacuum`. @@ -58,10 +68,143 @@ Auto-pruning should be thoroughly tested with property-based testing. We should **Only prune when running `chroma vacuum`**: instead of continuously pruning the `embeddings_queue` table, only prune it when running `chroma vacuum` or some other manual command. This alternative was rejected because Chroma should be able to automatically keep its database size in check without manual intervention. -## Resources +## Appendix + +### Incremental vacuum experiment + +Some tests were run to determine the impact of `PRAGMA incremental_vacuum` on read and write queries. + +Observations: + +- Parallel read queries during `PRAGMA incremental_vacuum` are not blocked. +- One or more (depending on number of threads) parallel read queries will see a large latency spike, which in most cases seems to be at least the duration of the vacuum operation. +- `PRAGMA incremental_vacuum` and write queries cannot be run in parallel (this is true in general for any query that writes data when in journaling mode). +- As a corollary to the above: if another process/thread writes and defers its commit, it can easily block the vacuum and cause it to time out. +- On a 2023 MacBook Pro, running `PRAGMA incremental_vacuum` on a database with ~1GB worth of free pages took around 900-1000ms. + +
+Source code + +Run this script to create `test.sqlite`, adjusting `TARGET_SIZE_BYTES` if desired: + +```python +import sqlite3 +import string +import random + +TARGET_SIZE_BYTES = 1000000000 +TEXT_COLUMN_SIZE = 32 + +def random_string(len): + return ''.join(random.choices(string.ascii_uppercase + string.digits, k=len)) + +conn = sqlite3.connect("test.sqlite") +conn.execute("PRAGMA auto_vacuum = INCREMENTAL") +conn.execute("CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT)") + +batch_size = 10000 +insert_query = "INSERT INTO test (name) VALUES (?)" +data = [(random_string(TEXT_COLUMN_SIZE),) for _ in range(batch_size)] + +num_rows = TARGET_SIZE_BYTES // (TEXT_COLUMN_SIZE + 4) # int is variable width, assume average 4 bytes + +for _ in range(num_rows // batch_size): + conn.executemany(insert_query, data) + conn.commit() + +conn.close() +``` + +Then, run this script to test vacuuming: + +```python +import multiprocessing +from multiprocessing.synchronize import Event +import sqlite3 +import time +import random +import string + +def random_string(len): + return ''.join(random.choices(string.ascii_uppercase + string.digits, k=len)) + +def print_results(timings): + if len(timings) == 0: + return + + timings.sort() + p95 = timings[int(len(timings) * 0.95)] + print(f"Ran {len(timings)} concurrent queries") + print(f"Query duration 95th percentile: {p95 * 1000}ms") + print(f"Query duration max: {timings[-1] * 1000}ms") + +def query_read(ready_event: Event, shutdown_event: Event): + conn = sqlite3.connect("test.sqlite") + + ready_event.set() + timings = [] + while not shutdown_event.is_set(): + started_at = time.time() + conn.execute("SELECT COUNT(*) FROM test") + duration = (time.time() - started_at) + timings.append(duration) + + conn.close() + print_results(timings) + +def query_write(ready_event: Event, shutdown_event: Event): + conn = sqlite3.connect("test.sqlite", check_same_thread=False) + cur = conn.cursor() + + ready_event.set() + timings = [] + while not shutdown_event.is_set(): + started_at = time.time() + cur.execute("INSERT INTO test (name) VALUES (?)", (random_string(32),)) + duration = (time.time() - started_at) + timings.append(duration) + + conn.close() + print_results(timings) + + +def increment_vacuum(): + conn = sqlite3.connect("test.sqlite", check_same_thread=False) + + conn.execute("DELETE FROM test") + conn.commit() + + ctx = multiprocessing.get_context("spawn") + ready_event = ctx.Event() + shutdown_event = ctx.Event() + # can switch between concurrent read and writes + process = ctx.Process(target=query_read, args=(ready_event, shutdown_event,), daemon=True) + # process = ctx.Process(target=query_write, args=(ready_event, shutdown_event,), daemon=True) + process.start() + ready_event.wait() + + started_at = time.time() + r = conn.execute("PRAGMA incremental_vacuum") + # https://stackoverflow.com/a/56412002 + r.fetchall() + finished_at = time.time() + + print(f"Vacuum took {(finished_at - started_at) * 1000}ms") + conn.close() + + shutdown_event.set() + process.join() + +if __name__ == '__main__': + increment_vacuum() +``` + +
+ +### Resources +- [SQLite Vacuum](https://sqlite.org/lang_vacuum.html) - [Excellent overview of different vacuuming strategies](https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/) -[^1]: [SQLite Vacuum](https://sqlite.org/lang_vacuum.html) -[^2]: [2.9: Transient Database Used by Vacuum](https://www.sqlite.org/tempfiles.html) -[^3]: https://www.sqlite.org/pragma.html#pragma_auto_vacuum +[^1]: [2.9: Transient Database Used by Vacuum](https://www.sqlite.org/tempfiles.html) +[^2]: https://www.sqlite.org/pragma.html#pragma_auto_vacuum From 97d1811ddb56ddaa408d159af2e50b139d471d3a Mon Sep 17 00:00:00 2001 From: Max Isom Date: Tue, 16 Jul 2024 15:12:32 -0700 Subject: [PATCH 07/10] Goodbye incremental vacuum --- ...02024_Write_Ahead_Log_Pruning_Vacuuming.md | 85 ++++++++++--------- 1 file changed, 43 insertions(+), 42 deletions(-) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index c6b0a626ec2..9a0db45147f 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -17,28 +17,20 @@ This CIP addresses both issues. ## Proposed Changes -Two additional things will be done after write transactions: +After every write transaction, if `log:prune` is enabled, the `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. (As long as this is done continuously, this is a relatively cheap operation.) -1. The `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. (As long as this is done continuously, this is a relatively cheap operation.) -2. `PRAGMA freelist_count` will be checked to see if the number of free pages multiplied by the page size is greater than the `log:vacuum_threshold` parameter. If so, `PRAGMA incremental_vacuum` is run to free up pages, up to `log:vacuum_limit`. This adds latency to write transactions, but will not block read transactions. +This does not directly reduce the disk size of the database, but allows SQLite to reuse the space occupied by the deleted rows—thus effectively bounding the disk usage of the `embeddings_queue` table by `hnsw:sync_threshold`. -### New configuration parameters - -**`log:vacuum_threshold`**: +## Public Interfaces -- Default: 1GB -- Unit: megabytes (Postgres' convention) -- Usage: this helps avoid excessive fragmentation—without this parameter, or if it's set to `0`, it is effectively the same as SQLite's full vacuum mode. +### New collection configuration parameters -**`log:vacuum_limit`**: +**`log:prune`**: -- Default: 0 -- Unit: megabytes (Postgres' convention) -- Usage: vacuuming adds latency to write transactions. This allows rough control over the added latency by only vacuuming free pages up to this limit. If set to `0`, vacuuming will always reclaim all available space. If set to a small non-zero value, it's possible that +- Default: `true` +- Usage: this exists mainly to ease migration. The only reason to set this to `false` is if your application is extremely latency-sensitive. -## Public Interfaces - -In addition to the configuration parameter described above, a new `chroma vacuum` command will be added to the CLI to manually perform a full vacuum of the database. Usage: +### New CLI command ```bash chroma vacuum --path ./chroma_data @@ -52,11 +44,12 @@ We should clearly document that `chroma vacuum` is not intended to be run while ## Compatibility, Deprecation, and Migration Plan -Incremental vacuuming is not available by default in SQLite, and it's a little more complicated than just flipping a setting: +The new `log:prune` parameter defaults to `false` on existing collections, because: -> However, changing from "none" to "full" or "incremental" can only occur when the database is new (no tables have yet been created) or by running the VACUUM command.[^2] +- The first pruning operation for an existing collection can be very slow. +- Some users may be relying on the WAL as a full backup. -This means existing installations will not benefit from auto-pruning until they run `chroma vacuum`. +This means existing installations will not benefit from auto-pruning until they run `chroma vacuum`. During the vacuum, `log:prune` will automatically be set to `true` on all collections. Users should see disk space freed immediately after upgrading and running `chroma vacuum` for the first time. Subsequent runs of `chroma vacuum` will likely free up no or very little disk space as the database will be continuously auto-pruned from that point forward. @@ -72,6 +65,8 @@ Auto-pruning should be thoroughly tested with property-based testing. We should ### Incremental vacuum experiment +(This is kept for posterity, but is no longer relevant to the current proposal.) + Some tests were run to determine the impact of `PRAGMA incremental_vacuum` on read and write queries. Observations: @@ -80,7 +75,7 @@ Observations: - One or more (depending on number of threads) parallel read queries will see a large latency spike, which in most cases seems to be at least the duration of the vacuum operation. - `PRAGMA incremental_vacuum` and write queries cannot be run in parallel (this is true in general for any query that writes data when in journaling mode). - As a corollary to the above: if another process/thread writes and defers its commit, it can easily block the vacuum and cause it to time out. -- On a 2023 MacBook Pro, running `PRAGMA incremental_vacuum` on a database with ~1GB worth of free pages took around 900-1000ms. +- On a 2023 MacBook Pro M3 Pro, running `PRAGMA incremental_vacuum` on a database with ~1GB worth of free pages took around 900-1000ms.
Source code @@ -128,17 +123,22 @@ import string def random_string(len): return ''.join(random.choices(string.ascii_uppercase + string.digits, k=len)) -def print_results(timings): +def print_results(timings, vacuum_start, vacuum_end): if len(timings) == 0: return - timings.sort() - p95 = timings[int(len(timings) * 0.95)] - print(f"Ran {len(timings)} concurrent queries") + durations = [end - start for (start, end) in timings] + + durations.sort() + p95 = durations[int(len(durations) * 0.95)] + print(f"Ran {len(durations)} concurrent queries") print(f"Query duration 95th percentile: {p95 * 1000}ms") - print(f"Query duration max: {timings[-1] * 1000}ms") + print(f"Query duration max: {durations[-1] * 1000}ms") + + num_queries_during_vacuum = sum(1 for (start, end) in timings if start >= vacuum_start and end <= vacuum_end) + print(f"Number of queries during vacuum: {num_queries_during_vacuum}") -def query_read(ready_event: Event, shutdown_event: Event): +def query_read(ready_event: Event, shutdown_event: Event, timings_tx): conn = sqlite3.connect("test.sqlite") ready_event.set() @@ -146,30 +146,28 @@ def query_read(ready_event: Event, shutdown_event: Event): while not shutdown_event.is_set(): started_at = time.time() conn.execute("SELECT COUNT(*) FROM test") - duration = (time.time() - started_at) - timings.append(duration) + timings.append((started_at, time.time())) conn.close() - print_results(timings) + timings_tx.send(timings) -def query_write(ready_event: Event, shutdown_event: Event): +def query_write(ready_event: Event, shutdown_event: Event, timings_tx): conn = sqlite3.connect("test.sqlite", check_same_thread=False) - cur = conn.cursor() ready_event.set() timings = [] while not shutdown_event.is_set(): started_at = time.time() - cur.execute("INSERT INTO test (name) VALUES (?)", (random_string(32),)) - duration = (time.time() - started_at) - timings.append(duration) + conn.execute("INSERT INTO test (name) VALUES (?)", (random_string(32),)) + conn.commit() + timings.append((started_at, time.time())) conn.close() - print_results(timings) + timings_tx.send(timings) def increment_vacuum(): - conn = sqlite3.connect("test.sqlite", check_same_thread=False) + conn = sqlite3.connect("test.sqlite", timeout=0, check_same_thread=False) conn.execute("DELETE FROM test") conn.commit() @@ -177,24 +175,28 @@ def increment_vacuum(): ctx = multiprocessing.get_context("spawn") ready_event = ctx.Event() shutdown_event = ctx.Event() + (timings_tx, timings_rx) = ctx.Pipe() # can switch between concurrent read and writes - process = ctx.Process(target=query_read, args=(ready_event, shutdown_event,), daemon=True) - # process = ctx.Process(target=query_write, args=(ready_event, shutdown_event,), daemon=True) + # process = ctx.Process(target=query_read, args=(ready_event, shutdown_event, timings_tx), daemon=True) + process = ctx.Process(target=query_write, args=(ready_event, shutdown_event, timings_tx), daemon=True) process.start() ready_event.wait() - started_at = time.time() + vacuum_started_at = time.time() r = conn.execute("PRAGMA incremental_vacuum") # https://stackoverflow.com/a/56412002 r.fetchall() - finished_at = time.time() + vacuum_finished_at = time.time() + print(f"Vacuum took {(vacuum_finished_at - vacuum_started_at) * 1000}ms") - print(f"Vacuum took {(finished_at - started_at) * 1000}ms") conn.close() shutdown_event.set() process.join() + timings = timings_rx.recv() + print_results(timings, vacuum_started_at, vacuum_finished_at) + if __name__ == '__main__': increment_vacuum() ``` @@ -207,4 +209,3 @@ if __name__ == '__main__': - [Excellent overview of different vacuuming strategies](https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/) [^1]: [2.9: Transient Database Used by Vacuum](https://www.sqlite.org/tempfiles.html) -[^2]: https://www.sqlite.org/pragma.html#pragma_auto_vacuum From 937af7f5b34fc2f33f8e4a3eb7fe84830e1771d4 Mon Sep 17 00:00:00 2001 From: Max Isom Date: Wed, 17 Jul 2024 11:57:49 -0700 Subject: [PATCH 08/10] Add deletion experiment --- ...02024_Write_Ahead_Log_Pruning_Vacuuming.md | 84 +++++++++++++++++++ 1 file changed, 84 insertions(+) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index 9a0db45147f..e5b9bc9abcf 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -63,6 +63,90 @@ Auto-pruning should be thoroughly tested with property-based testing. We should ## Appendix +### WAL deletion experiment + +Some tests were run to determine the impact on latency of deleting rows from `embeddings_queue`. Added latency scales with `hnsw:sync_threshold`. + +Observations from running on a 2023 MacBook Pro M3 Pro, using 1024 dimension embeddings: + +- `hnsw:sync_threshold` of 1000 (the default) adds ~1ms of latency (p50 of 1.01ms, p99 of 1.26ms). +- `hnsw:sync_threshold` of 10_000 adds 9-56ms of latency (p50 of 9ms, p90 of 10ms, p99 of 56ms). + +
+Source code + +```python +import sqlite3 +import time +import numpy as np +import os + +DEFAULT_SYNC_THRESHOLD = 1000 +EMBEDDING_DIMENSION = 1024 + +def measure(conn, sync_threshold, repeat): + timings = [] + for _ in range(repeat): + # Create + for i in range(sync_threshold): + encoded_embedding = np.random.rand(EMBEDDING_DIMENSION).astype(np.float32).tobytes() + + conn.execute(""" + INSERT INTO embeddings_queue (operation, topic, id, vector, encoding, metadata) + VALUES (?, ?, ?, ?, ?, ?) + """, (0, "test", i, encoded_embedding, "test", "test")) + conn.commit() + + # Delete + started_at = time.time() + conn.execute("DELETE FROM embeddings_queue WHERE seq_id <= ?", (sync_threshold,)) + conn.commit() + timings.append(time.time() - started_at) + + return timings + +def print_timings(timings, batch_size): + print(f"Ran {len(timings)} delete queries deleting {batch_size} rows each") + print(f"p50: {np.percentile(timings, 50) * 1000}ms") + print(f"p90: {np.percentile(timings, 90) * 1000}ms") + print(f"p99: {np.percentile(timings, 99) * 1000}ms") + + +def main(): + os.remove("test.sqlite") + conn = sqlite3.connect("test.sqlite") + conn.execute(""" + CREATE TABLE embeddings_queue ( + seq_id INTEGER PRIMARY KEY, + created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, + operation INTEGER NOT NULL, + topic TEXT NOT NULL, + id TEXT NOT NULL, + vector BLOB, + encoding TEXT, + metadata TEXT + ) + """) + + num_rows = DEFAULT_SYNC_THRESHOLD * 16 + + print(f"hnsw:sync_threshold = {DEFAULT_SYNC_THRESHOLD}:") + timings = measure(conn, DEFAULT_SYNC_THRESHOLD, 50) + print_timings(timings, DEFAULT_SYNC_THRESHOLD) + + conn.execute("DELETE FROM embeddings_queue") + conn.commit() + + sync_threshold = DEFAULT_SYNC_THRESHOLD * 10 + print(f"hnsw:sync_threshold = {sync_threshold}:") + timings = measure(conn, sync_threshold, 50) + print_timings(timings, sync_threshold) + +main() +``` + +
+ ### Incremental vacuum experiment (This is kept for posterity, but is no longer relevant to the current proposal.) From a0d737841bbcfceba6ee8f60457fab48414dd421 Mon Sep 17 00:00:00 2001 From: Max Isom Date: Wed, 17 Jul 2024 12:12:18 -0700 Subject: [PATCH 09/10] Add baseline latency measurement script --- ...02024_Write_Ahead_Log_Pruning_Vacuuming.md | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index e5b9bc9abcf..1a9313a2429 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -147,6 +147,45 @@ main()
+Additionally, the baseline latency of `collection.add()` on calls that trigger a persist was measured to be 49-62ms. + +
+Source code for baseline latency measurement + +```python +import chromadb +import numpy as np +import time + +SYNC_THRESHOLD = 1000 + +client = chromadb.PersistentClient("./bench-baseline") +collection = client.create_collection("test") + +timings = [] + +for batch_i in range(10): + ids = [f"test-{i}" for i in range(SYNC_THRESHOLD)] + embeddings = np.random.rand(SYNC_THRESHOLD, 1024).astype(np.float32) + + # Add all except last id + collection.add(ids=ids[:-1], embeddings=embeddings[:-1]) + print("added all except last id") + + # Should trigger the persist + started_at = time.time() + collection.add(ids=[ids[-1]], embeddings=[embeddings[-1].tolist()]) + timings.append(time.time() - started_at) + + collection.delete(ids=ids) + +print(f"p50: {np.percentile(timings, 50) * 1000}ms") +print(f"p90: {np.percentile(timings, 90) * 1000}ms") +print(f"p99: {np.percentile(timings, 99) * 1000}ms") +``` + +
+ ### Incremental vacuum experiment (This is kept for posterity, but is no longer relevant to the current proposal.) From 13ac0f6eef7816222aa637225d8fa792e35e24d4 Mon Sep 17 00:00:00 2001 From: Max Isom Date: Mon, 22 Jul 2024 11:15:13 -0700 Subject: [PATCH 10/10] Update configuration plan, don't expose param to users --- ...02024_Write_Ahead_Log_Pruning_Vacuuming.md | 21 ++++++++----------- 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md index 1a9313a2429..e3fd763a30e 100644 --- a/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md +++ b/docs/cip/CIP-07102024_Write_Ahead_Log_Pruning_Vacuuming.md @@ -17,18 +17,20 @@ This CIP addresses both issues. ## Proposed Changes -After every write transaction, if `log:prune` is enabled, the `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. (As long as this is done continuously, this is a relatively cheap operation.) +After every write transaction, if log pruning is enabled, the `embeddings_queue` table will be pruned to remove rows that are no longer needed. Specifically, rows with a sequence ID less than the minimum sequence ID of any active subscriber will be deleted. (As long as this is done continuously, this is a relatively cheap operation.) This does not directly reduce the disk size of the database, but allows SQLite to reuse the space occupied by the deleted rows—thus effectively bounding the disk usage of the `embeddings_queue` table by `hnsw:sync_threshold`. -## Public Interfaces +To control log pruning, `SqlEmbeddingsQueue` will get a new configuration object with a single parameter: `automatically_prune`. This will default to `false` for systems with non-empty embedding queues, because: + +- The first pruning operation for a large embeddings queue can be very slow. +- Some users may be relying on the WAL as a full backup. -### New collection configuration parameters +If the system's embedding queue is empty (a fresh system), `automatically_prune` will default to `true`. -**`log:prune`**: +This configuration object will be stored in a new table, `embeddings_queue_config`. -- Default: `true` -- Usage: this exists mainly to ease migration. The only reason to set this to `false` is if your application is extremely latency-sensitive. +## Public Interfaces ### New CLI command @@ -44,12 +46,7 @@ We should clearly document that `chroma vacuum` is not intended to be run while ## Compatibility, Deprecation, and Migration Plan -The new `log:prune` parameter defaults to `false` on existing collections, because: - -- The first pruning operation for an existing collection can be very slow. -- Some users may be relying on the WAL as a full backup. - -This means existing installations will not benefit from auto-pruning until they run `chroma vacuum`. During the vacuum, `log:prune` will automatically be set to `true` on all collections. +Existing installations will not benefit from auto-pruning until they run `chroma vacuum`. During the vacuum, `automatically_prune` will be set to `true`. Users should see disk space freed immediately after upgrading and running `chroma vacuum` for the first time. Subsequent runs of `chroma vacuum` will likely free up no or very little disk space as the database will be continuously auto-pruned from that point forward.