Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Semi-Supervised UMAP reduction errors when input passes a certain size. #1604

Closed
DavidEverlaw opened this issue Jan 27, 2020 · 3 comments
Assignees
Labels
4 - Waiting on Author Waiting for author to respond to review bug Something isn't working

Comments

@DavidEverlaw
Copy link

Describe the bug
UMAP errors for supervised reduction with different sized input when using target labels.
Using RAPIDS v0.11

Steps/Code to reproduce bug
This errors

from sklearn.datasets import make_classification
from cuml import UMAP
X, Y = make_classification(n_samples = 200000, n_features = 10, n_redundant = 0, n_informative = 10, n_clusters_per_class = 1, n_classes = 1000)
reducer = UMAP(n_neighbors = 20, init="spectral")
X2 = reducer.fit_transform(X, Y)
Error
/var/lib/opt/miniconda/envs/clustering-gpu/bin/ipython:1: UserWarning: Parameter should_downcast is deprecated, use convert_dtype in fit, fit_transform and transform  methods instead. 
  #!/var/lib/opt/miniconda/envs/clustering-gpu/bin/python
An exception occurred freeing COO memory
An exception occurred freeing COO memory
An exception occurred freeing COO memory
An exception occurred freeing COO memory
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-b82724dcd13b> in <module>
----> 1 X2 = reducer.fit_transform(X, Y)

cuml/manifold/umap.pyx in cuml.manifold.umap.UMAP.fit_transform()

cuml/manifold/umap.pyx in cuml.manifold.umap.UMAP.fit()

RuntimeError: Exception occured! file=/conda/conda-bld/libcuml_1566588242169/work/cpp/src_prims/utils.h line=152: FAIL: call='cudaMemcpyAsync(dst, src, len * sizeof(Type), cudaMemcpyDefault, stream)'. Reason:an illegal memory access was encountered

Obtained 61 stack frames
#0 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/common/../../../../libcuml++.so(_ZN8MLCommon9Exception16collectCallStackEv+0x3e) [0x7f3afd90c56e]
#1 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/common/../../../../libcuml++.so(_ZN8MLCommon9ExceptionC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x80) [0x7f3afd90d080]
#2 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/common/../../../../libcuml++.so(_ZN8MLCommon4copyIiEEvPT_PKS1_mP11CUstream_st+0xf9) [0x7f3afd920909]
#3 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/common/../../../../libcuml++.so(_ZN8MLCommon6Sparse17csr_add_calc_indsIfLi32EEEmPiS2_PT_iS2_S2_S4_iiS2_P11CUstream_st+0x12b) [0x7f3afdb5550b]
#4 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/common/../../../../libcuml++.so(_ZN8UMAPAlgo10Supervised35general_simplicial_set_intersectionIfLi32EEEvPiPN8MLCommon6Sparse3COOIT_EES2_S8_S8_fP11CUstream_st+0x8b) [0x7f3afdb5b11b]
#5 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/common/../../../../libcuml++.so(_ZN8UMAPAlgo10Supervised28perform_general_intersectionILi32EfEEvRKN2ML10cumlHandleEPT0_PN8MLCommon6Sparse3COOIS6_EESC_PNS2_10UMAPParamsEP11CUstream_st+0x4d0) [0x7f3afdb5c150]
#6 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/common/../../../../libcuml++.so(_ZN8UMAPAlgo4_fitIfLi32EEEvRKN2ML10cumlHandleEPT_S6_iiPNS1_10UMAPParamsES6_+0x27b) [0x7f3afdb5cd9b]
#7 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/manifold/umap.cpython-37m-x86_64-linux-gnu.so(+0x11e8c) [0x7f3abdc37e8c]
#8 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyObject_FastCallDict+0x9f) [0x55804e687a3f]
#9 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyObject_Call_Prepend+0x63) [0x55804e6a5e53]
#10 in /var/lib/opt/miniconda/envs/clustering-gpu/lib/python3.7/site-packages/cuml/manifold/umap.cpython-37m-x86_64-linux-gnu.so(+0x12eaf) [0x7f3abdc38eaf]
#11 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyObject_FastCallKeywords+0x49b) [0x55804e6de8fb]
#12 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x52f8) [0x55804e7426e8]
#13 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalCodeWithName+0x2f9) [0x55804e686539]
#14 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(PyEval_EvalCodeEx+0x44) [0x55804e687424]
#15 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(PyEval_EvalCode+0x1c) [0x55804e68744c]
#16 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(+0x1daf8d) [0x55804e74cf8d]
#17 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyMethodDef_RawFastCallKeywords+0xe9) [0x55804e6d65d9]
#18 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyCFunction_FastCallKeywords+0x21) [0x55804e6d6861]
#19 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x47a4) [0x55804e741b94]
#20 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyGen_Send+0x2a2) [0x55804e6df592]
#21 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x1a79) [0x55804e73ee69]
#22 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyGen_Send+0x2a2) [0x55804e6df592]
#23 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x1a79) [0x55804e73ee69]
#24 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyGen_Send+0x2a2) [0x55804e6df592]
#25 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyMethodDef_RawFastCallKeywords+0x8d) [0x55804e6d657d]
#26 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyMethodDescr_FastCallKeywords+0x4f) [0x55804e6de3cf]
#27 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x4c8c) [0x55804e74207c]
#28 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallKeywords+0xfb) [0x55804e6d5ccb]
#29 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x416) [0x55804e73d806]
#30 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallKeywords+0xfb) [0x55804e6d5ccb]
#31 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x6a3) [0x55804e73da93]
#32 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalCodeWithName+0x2f9) [0x55804e686539]
#33 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallKeywords+0x387) [0x55804e6d5f57]
#34 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x14dc) [0x55804e73e8cc]
#35 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalCodeWithName+0x2f9) [0x55804e686539]
#36 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallKeywords+0x325) [0x55804e6d5ef5]
#37 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x6a3) [0x55804e73da93]
#38 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalCodeWithName+0x2f9) [0x55804e686539]
#39 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallKeywords+0x325) [0x55804e6d5ef5]
#40 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x6a3) [0x55804e73da93]
#41 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallKeywords+0xfb) [0x55804e6d5ccb]
#42 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x6a3) [0x55804e73da93]
#43 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalCodeWithName+0x2f9) [0x55804e686539]
#44 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallDict+0x400) [0x55804e687860]
#45 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyObject_Call_Prepend+0x63) [0x55804e6a5e53]
#46 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(PyObject_Call+0x6e) [0x55804e698dbe]
#47 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x1e42) [0x55804e73f232]
#48 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalCodeWithName+0x2f9) [0x55804e686539]
#49 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyFunction_FastCallKeywords+0x387) [0x55804e6d5f57]
#50 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalFrameDefault+0x416) [0x55804e73d806]
#51 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_PyEval_EvalCodeWithName+0x2f9) [0x55804e686539]
#52 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(PyEval_EvalCodeEx+0x44) [0x55804e687424]
#53 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(PyEval_EvalCode+0x1c) [0x55804e68744c]
#54 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(+0x22ab74) [0x55804e79cb74]
#55 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(PyRun_FileExFlags+0xa1) [0x55804e7a6eb1]
#56 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(PyRun_SimpleFileExFlags+0x1c3) [0x55804e7a70a3]
#57 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(+0x236195) [0x55804e7a8195]
#58 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(_Py_UnixMain+0x3c) [0x55804e7a82bc]
#59 in /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f3b20fcbb97]
#60 in /var/lib/opt/miniconda/envs/clustering-gpu/bin/python(+0x1db062) [0x55804e74d062]

This works

from sklearn.datasets import make_classification
from cuml import UMAP
X, Y = make_classification(n_samples = 100000, n_features = 10, n_redundant = 0, n_informative = 10, n_clusters_per_class = 1, n_classes = 1000)
reducer = UMAP(n_neighbors = 20, init="spectral")
X2 = reducer.fit_transform(X, Y)

Expected behavior
UMAP should semi-supervise reduce the dataset without error.

Environment details (please complete the following information):

NVIDIA-SMI Driver: 410.104, CUDA: 10.0 Tesla V100
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    42W / 300W |   1439MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1940      C   python                                      1429MiB |
+-----------------------------------------------------------------------------+

OS: Ubuntu 18.04.3 LTS
AWS EC2
python 3.7.4
RAPIDS installed via: conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.11 python=3.7

Additional context
with n_samples = 125000, it errors sometimes
with n_samples > 126000, errors every time

@DavidEverlaw DavidEverlaw added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 27, 2020
@JohnZed JohnZed removed the ? - Needs Triage Need team to review and classify label Jan 29, 2020
@cjnolet
Copy link
Member

cjnolet commented Feb 16, 2020

@DavidEverlaw, Any chance you would be able to try our current nightlies? I will also try w/ 0.11 to verify, but I am running your code example with n_samples=400000 on the current nightly and it's running successfully for me.

@cjnolet cjnolet added the 4 - Waiting on Author Waiting for author to respond to review label Feb 16, 2020
@DavidEverlaw
Copy link
Author

@cjnolet on 0.12 looks to be working with n_samples = 2_000_000

@cjnolet
Copy link
Member

cjnolet commented Feb 17, 2020

@DavidEverlaw, great! I’m taking your response to mean that the error was not able to be reproduced in 0.12, even up to 2M samples.

I’m going to close this for now but please open this back up if you are still encountering errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants