Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faiss crash when doing the search #596

Closed
GitHubProgress3 opened this issue Sep 19, 2018 · 10 comments
Closed

faiss crash when doing the search #596

GitHubProgress3 opened this issue Sep 19, 2018 · 10 comments
Assignees
Labels

Comments

@GitHubProgress3
Copy link

GitHubProgress3 commented Sep 19, 2018

I have 8 tesla P4 cards in my machine, each GPU contain three faiss::gpu::GpuIndexIVFPQ objects working on three databases, each data base size is 6250000(number of features)*128(each feature have 128 dimensions)*sizeof(float).

  1. The training code is
 m_vec_GpuIndexIVFPQ.at(idxId).get()->train(feat_num[idxId],vec_feats[idxId]);
m_vec_GpuIndexIVFPQ.at(idxId).get()->reset();
m_vec_GpuIndexIVFPQ.at(idxId).get()->add(feat_num[idxId],vec_feats[idxId]);

During training:the parameters are

feat_num = 6250000, 
 FEATURE_DIM = 128;
 ShardCount = 3;  //3  faiss::gpu::GpuIndexIVFPQ objects
 Cl_Centroid = 2000;
 SubM = 64;
 nProbe = 500;
TempMemoryFraction = 0.18 in standardGPUResources

2.The searching code is

 for(int i = 0; i<m_vec_GpuIndexIVFPQ.size();i++)
    {
 m_vec_GpuIndexIVFPQ[i].get()->search((size_t)feat_num,query_feats,k,res_dists+i*k*feat_num,res_nns+i*k*feat_num);
    }

During searching: the paramenters are:

feat_num = 240; k = 1000;

during searching, the code crash, it crash when the second faiss::gpu::GpuIndexIVFPQ object is doing the searching.

3.The gdb information is:

Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ff31c3ff700 (LWP 7991)]
0x00007ffff5849c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5849c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff584d028 in __GI_abort () at abort.c:89
#2  0x00007ffff6482257 in faiss::gpu::fromDevice<float> (src=0x7ff38201e000, dst=0x7ff3018ae410, num=240000, stream=0x5adc81b0) at utils/CopyUtils.cuh:65
#3  0x00007ffff6480ff9 in faiss::gpu::fromDevice<float, 2> (src=..., dst=0x7ff3018ae410, stream=0x5adc81b0) at utils/CopyUtils.cuh:100
#4  0x00007ffff648dd7b in faiss::gpu::GpuIndexIVFPQ::searchImpl_ (this=0x5adc3ab0, n=240, x=0x5b6cbed0, k=1000, distances=0x7ff3018ae410, labels=0x7ff2cbde9810)
    at GpuIndexIVFPQ.cu:425
#5  0x00007ffff650823f in faiss::gpu::GpuIndex::search (this=0x5adc3ab0, n=240, x=0x5b6cbed0, k=1000, distances=0x7ff3018ae410, labels=0x7ff2cbde9810) at GpuIndex.cu:142
#6  0x00007ffff647254d in GpuCwKnnShardImpl::Search (this=0x4f7f76d0, query_feats=0x5b6cbed0, feat_num=240, feat_dim=<optimized out>, k=1000, res_dists=0x7ff3017c3e10,
    res_nns=0x7ff2cbc14c10) at GpuCwKnnShardImpl.cpp:246
#7  0x00007ffff646efc9 in Run (arg=0x4f7f7670) at GpuCwMultiKnnImpl.cpp:149
#8  0x00007ffff5be0184 in start_thread (arg=0x7ff31c3ff700) at pthread_create.c:312
#9  0x00007ffff590d37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
  1. Other information:
    a) program do not crash when I using only 1 GPU or 2 GPUs.
    b) I have checked the memory on the CPU size when doing the cudaMemcpy(device to host), the memory on the CPU side do not have any error. I can read the each memory address on the CPU side.
    c) Sometime the program success in running,if it success on the first search, it will keep on running on the following loop search. Sometime it failed on the first time, the debug information is on the above.
    d) If I only use two faiss::gpu::GpuIndexIVFPQ objects, each data base have 9375000 features. the program never crash
    e) I check the GPU memory when doing search, each GPU have about 1GB sizes left when doing searching.

Platform

OS: Ubuntu 14.04
cuda 8.0
gcc 4.8.4

Running on:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:04:00.0 Off |                    0 |
| N/A   55C    P0    39W /  75W |   4859MiB /  7611MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P4            Off  | 00000000:05:00.0 Off |                    0 |
| N/A   57C    P0    46W /  75W |   4857MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P4            Off  | 00000000:08:00.0 Off |                    0 |
| N/A   57C    P0    45W /  75W |   4859MiB /  7611MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P4            Off  | 00000000:09:00.0 Off |                    0 |
| N/A   54C    P0    45W /  75W |   4859MiB /  7611MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla P4            Off  | 00000000:85:00.0 Off |                    0 |
| N/A   57C    P0    42W /  75W |   4863MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla P4            Off  | 00000000:86:00.0 Off |                    0 |
| N/A   54C    P0    45W /  75W |   4865MiB /  7611MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla P4            Off  | 00000000:89:00.0 Off |                    0 |
| N/A   53C    P0    47W /  75W |   4865MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla P4            Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   52C    P0    45W /  75W |   4859MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
  1. Any help on fixing this?
@wickedfoo
Copy link
Contributor

Can you rerun this using the cuda-memcheck program to see what errors it reports?

@GitHubProgress3
Copy link
Author

  1. I tried to do the cuda-memcheck, but the remote machine goes dead during the cuda-memcheck program. The program runs a really long time(longer than 5 hours) . It have tons of report in a function( if I remember correctly getDeviceForAddress), in this function, whenever the address is in CPU side, the cuda-memcheck will make an error log in it. The program have not finished training the data base before it goes to search.
  2. However I tried others approach, I make the exactly same call cudaMemcpy(device to host) (just like what is doing in copyfrom functions) before and after some functions, and currently I know that something error happens in the fuction call runPQScanMultiPassNoPrecomputed inside the PQScanMultiPassNoPrecomputed.cu.
  3. Is there any method to speed up finding out what goes wrong?

@GitHubProgress3
Copy link
Author

I have find the work around, close comments

@GitHubProgress3
Copy link
Author

GitHubProgress3 commented Oct 15, 2018 via email

@wickedfoo
Copy link
Contributor

k or nprobe > 1024 will not be supported any time soon, if ever, for the GPU.
What dimension are your vectors? What PQ size do you want? The max PQ size may not change either, but this is easier to implement. I'm wondering if it is in fact useful for your case.

@GitHubProgress3
Copy link
Author

GitHubProgress3 commented Oct 18, 2018 via email

@wickedfoo
Copy link
Contributor

PCA reduction to, say, 128 or 256 dimensions might be a better strategy than PQ on such a high dimensional vector. It is likely that the variation across the dimensions is very non-uniform anyways.

@wickedfoo
Copy link
Contributor

float16 IVF flat would also be more efficient, faster and take the same amount of memory as PQ 256 on a 512 dimensional vector.

PQ is a form of lossy compression of the vectors anyways.

@vincentLk
Copy link

I have 8 tesla P4 cards in my machine, each GPU contain three faiss::gpu::GpuIndexIVFPQ objects working on three databases, each data base size is 6250000(number of features)*128(each feature have 128 dimensions)*sizeof(float).

  1. The training code is
 m_vec_GpuIndexIVFPQ.at(idxId).get()->train(feat_num[idxId],vec_feats[idxId]);
m_vec_GpuIndexIVFPQ.at(idxId).get()->reset();
m_vec_GpuIndexIVFPQ.at(idxId).get()->add(feat_num[idxId],vec_feats[idxId]);

During training:the parameters are

feat_num = 6250000, 
 FEATURE_DIM = 128;
 ShardCount = 3;  //3  faiss::gpu::GpuIndexIVFPQ objects
 Cl_Centroid = 2000;
 SubM = 64;
 nProbe = 500;
TempMemoryFraction = 0.18 in standardGPUResources

2.The searching code is

 for(int i = 0; i<m_vec_GpuIndexIVFPQ.size();i++)
    {
 m_vec_GpuIndexIVFPQ[i].get()->search((size_t)feat_num,query_feats,k,res_dists+i*k*feat_num,res_nns+i*k*feat_num);
    }

During searching: the paramenters are:

feat_num = 240; k = 1000;

during searching, the code crash, it crash when the second faiss::gpu::GpuIndexIVFPQ object is doing the searching.

3.The gdb information is:

Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::fromDevice(T*, T*, size_t, cudaStream_t) [with T = float; size_t = long unsigned int; cudaStream_t = CUstream_st*] at utils/CopyUtils.cuh:69; details: CUDA error 77

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ff31c3ff700 (LWP 7991)]
0x00007ffff5849c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5849c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff584d028 in __GI_abort () at abort.c:89
#2  0x00007ffff6482257 in faiss::gpu::fromDevice<float> (src=0x7ff38201e000, dst=0x7ff3018ae410, num=240000, stream=0x5adc81b0) at utils/CopyUtils.cuh:65
#3  0x00007ffff6480ff9 in faiss::gpu::fromDevice<float, 2> (src=..., dst=0x7ff3018ae410, stream=0x5adc81b0) at utils/CopyUtils.cuh:100
#4  0x00007ffff648dd7b in faiss::gpu::GpuIndexIVFPQ::searchImpl_ (this=0x5adc3ab0, n=240, x=0x5b6cbed0, k=1000, distances=0x7ff3018ae410, labels=0x7ff2cbde9810)
    at GpuIndexIVFPQ.cu:425
#5  0x00007ffff650823f in faiss::gpu::GpuIndex::search (this=0x5adc3ab0, n=240, x=0x5b6cbed0, k=1000, distances=0x7ff3018ae410, labels=0x7ff2cbde9810) at GpuIndex.cu:142
#6  0x00007ffff647254d in GpuCwKnnShardImpl::Search (this=0x4f7f76d0, query_feats=0x5b6cbed0, feat_num=240, feat_dim=<optimized out>, k=1000, res_dists=0x7ff3017c3e10,
    res_nns=0x7ff2cbc14c10) at GpuCwKnnShardImpl.cpp:246
#7  0x00007ffff646efc9 in Run (arg=0x4f7f7670) at GpuCwMultiKnnImpl.cpp:149
#8  0x00007ffff5be0184 in start_thread (arg=0x7ff31c3ff700) at pthread_create.c:312
#9  0x00007ffff590d37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
  1. Other information:
    a) program do not crash when I using only 1 GPU or 2 GPUs.
    b) I have checked the memory on the CPU size when doing the cudaMemcpy(device to host), the memory on the CPU side do not have any error. I can read the each memory address on the CPU side.
    c) Sometime the program success in running,if it success on the first search, it will keep on running on the following loop search. Sometime it failed on the first time, the debug information is on the above.
    d) If I only use two faiss::gpu::GpuIndexIVFPQ objects, each data base have 9375000 features. the program never crash
    e) I check the GPU memory when doing search, each GPU have about 1GB sizes left when doing searching.

Platform

OS: Ubuntu 14.04
cuda 8.0
gcc 4.8.4

Running on:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:04:00.0 Off |                    0 |
| N/A   55C    P0    39W /  75W |   4859MiB /  7611MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P4            Off  | 00000000:05:00.0 Off |                    0 |
| N/A   57C    P0    46W /  75W |   4857MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P4            Off  | 00000000:08:00.0 Off |                    0 |
| N/A   57C    P0    45W /  75W |   4859MiB /  7611MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P4            Off  | 00000000:09:00.0 Off |                    0 |
| N/A   54C    P0    45W /  75W |   4859MiB /  7611MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla P4            Off  | 00000000:85:00.0 Off |                    0 |
| N/A   57C    P0    42W /  75W |   4863MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla P4            Off  | 00000000:86:00.0 Off |                    0 |
| N/A   54C    P0    45W /  75W |   4865MiB /  7611MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla P4            Off  | 00000000:89:00.0 Off |                    0 |
| N/A   53C    P0    47W /  75W |   4865MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla P4            Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   52C    P0    45W /  75W |   4859MiB /  7611MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
  1. Any help on fixing this?

hi, i have the same problem, have you fixed it? how? i need your help

@thebirdgr
Copy link

Hi @GitHubProgress3, how did you solve issue in question? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants