-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a faster ingestion mode - pipeline #1750
Conversation
Published docs preview URL: https://privategpt-preview-a9ccd9ef-0949-4327-a8f2-1b1f816f1bc5.docs.buildwithfern.com |
Published docs preview URL: https://privategpt-preview-a10c7df0-2c41-46cb-958c-151f09400f61.docs.buildwithfern.com |
Published docs preview URL: https://privategpt-preview-2f8403c1-7839-4be4-a483-04a72a181e7c.docs.buildwithfern.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I can say is 🙌
Impressive contribution and execution
It is great to have you as a contributor of the project 👏
Created a faster ingestion mode - pipeline
Configuration
Comparison (mm:ss) Ingesting 434 documents 144Mb all stores are in postgres using ollama for embeddings
Using the
local
profileIn the
parallel
ingest design, the blocking mutex for the index write causes a bottleneck stalling the embedding computations until the write operation completes. This is particularly problematic because the index is updated per file, exacerbating the slowdown. In contrast, thepipeline
design adopts a non-blocking approach. Here, all worker data is fed into a single queue, where it accumulates before being written less frequently. This design choice allows for smoother and more efficient processing, as it minimizes the impact of filesystem operations on the overall workflow.Add an ETA logger so you can get an idea how far its gone and when it going to finish
The first log will appear after 30s of ingestion, and then every 60s thereafter.