-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ondisk merge consuming RAM #1262
Comments
Just a silly question: are you sure /tmp is not a RAM disk? |
Thanks for the response. It is an M.2 NVMe SSD. |
To clarify, no, it is not a RAM disk: $ df -h I tried indexing on /tmp, as well as a folder in my home directory. Not sure if it matters, but to give a complete picture: this is running in a VirtualBox VM. |
Here are some more interesting observations that may help debugging. The total combined size of all partitions is ~22.5 gb. I had observed that merging consumes twice the amount of memory. So I gave the VM 53 gb RAM as to try processing in-memory. Now even with this setup, when I tried to merge, the system crashed once the memory usage reached ~37 gb. I tried the in-memory merge multiple times, and I observed the crash at ~37 gb limit every time. As a separate exercise, I tried merging some of the partitions. E.g., only first half of the partitions, or only the second half. The merge works, which means the partitioned indexes themselves are fine. Am I hitting some bug? |
Merging does consume twice the amount of disk (not RAM). Could you try the following code: issue_1262.ipynb |
The script crashed my VM :) $ python
That's the last printout before the crash. Does this imply memmap isn't working correctly on the VM? Some configuration issue? Thanks for the help so far. |
The VM does not emulate the Linux functionality properly. |
I spun up a VM on AWS EC2. I then copied the same index partitions and ran the same C++ code for merging the partitions. The VM had only a few GB of RAM. The program successfully merged the indexes! So the issue was that my original VM wasn't emulating Linux functionality correctly. Thanks again for the prompt responses and help figuring this out. I'm closing the bug. |
Summary
I'm following the demo_ondisk_ivf.py demo to merge several indexes (partitions) into one. The total combined size of all partitions exceeds my RAM capacity. The documentation says this should be fine, but the RAM consumption keeps growing until the system crashes.
Platform
OS: Ubuntu 18
Running on:
Interface:
Reproduction instructions
The partitions (indexes to merge) were created with 'IVF512,Flat'. The code for merging is as follows:
When ondisk->merge_from() runs, I can see the merge progress with output lines like:
merged 437 lists in 135.845 s
And I can see the RAM usage growing. It keeps growing until the system runs out of memory and crashes. According to the demo, using faiss::IO_FLAG_MMAP should have prevented all partitioned indexes getting loaded into RAM. What am I missing? I tried a lot of variations (e.g., merging only two partitions at a time), but no luck. Any guidance will be appreciated. Thanks.
The text was updated successfully, but these errors were encountered: