Skip to content

Commit

Permalink
[BUG]: FTS delete (#1664)
Browse files Browse the repository at this point in the history
Without FTS clean-up, the SQLite file grows over time as collections are
added and removed:

No clean-up:


![image](https://github.com/chroma-core/chroma/assets/1157440/fdea6f36-8a7f-4d5d-8810-40feea9c7c91)


With clean-up:


![image](https://github.com/chroma-core/chroma/assets/1157440/c1ea757f-3cb7-4eca-a090-ebfe0c45c067)

## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
	 - Enabled secure-delete option in FTS
         - Removing entries for deleted segments from FTS

## Test plan
*How are these changes tested?*

- [ ] Tests pass locally with `pytest` for python, `yarn test` for js

## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*
  • Loading branch information
tazarov authored Feb 2, 2024
1 parent e5751fd commit 9b8cbe6
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 0 deletions.
17 changes: 17 additions & 0 deletions chromadb/segment/impl/metadata/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,7 @@ def _where_doc_criterion(
def delete(self) -> None:
t = Table("embeddings")
t1 = Table("embedding_metadata")
t2 = Table("embedding_fulltext_search")
q0 = (
self._db.querybuilder()
.from_(t1)
Expand Down Expand Up @@ -603,7 +604,23 @@ def delete(self) -> None:
)
)
)
q_fts = (
self._db.querybuilder()
.from_(t2)
.delete()
.where(
t2.rowid.isin(
self._db.querybuilder()
.from_(t)
.select(t.id)
.where(
t.segment_id == ParameterValue(self._db.uuid_to_db(self._id))
)
)
)
)
with self._db.tx() as cur:
cur.execute(*get_sql(q_fts))
cur.execute(*get_sql(q0))
cur.execute(*get_sql(q))

Expand Down
20 changes: 20 additions & 0 deletions chromadb/test/segment/test_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -657,3 +657,23 @@ def test_delete_segment(
res = cur.execute(sql, params)
# assert that the segment is gone
assert len(res.fetchall()) == 0

fts_t = Table("embedding_fulltext_search")
q_fts = (
_db.querybuilder()
.from_(fts_t)
.select()
.where(
fts_t.rowid.isin(
_db.querybuilder()
.from_(t)
.select(t.id)
.where(t.segment_id == ParameterValue(_db.uuid_to_db(_id)))
)
)
)
sql, params = get_sql(q_fts)
with _db.tx() as cur:
res = cur.execute(sql, params)
# assert that all FTS rows are gone
assert len(res.fetchall()) == 0

0 comments on commit 9b8cbe6

Please sign in to comment.