Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] PredictSparse16FilTest error #3206

Closed
tfeher opened this issue Nov 30, 2020 · 2 comments
Closed

[BUG] PredictSparse16FilTest error #3206

tfeher opened this issue Nov 30, 2020 · 2 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@tfeher
Copy link
Contributor

tfeher commented Nov 30, 2020

Describe the bug

One of the FIL tests produces errors in CI. Here is an example:

https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cuml/job/prb/job/cuml-gpu-test/CUDA=10.2,OS=centos7,PYTHON=3.8/111/console

This is not deterministic, I have seen 3x on CI. Once I could reproduce it locally. Several other times I could not reproduce it.

Steps/Code to reproduce bug

git clone https://github.com/tfeher/cuml.git 
nvidia-docker run --privileged -e HOST_USER_ID=0 -v$PWD:/mydata -w/mydata  --rm -it  rapidsai/rapidsai-dev-nightly:0.17-cuda10.2-devel-centos7-py3.8
mkdir cuml/cpp/build
cd cuml/cpp/build
cmake .. -DGPU_ARCHS=70 -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX 
make -j
test/ml --gtest_filter=Fil*

This produces (non deterministically):

09:38:27 /opt/conda/envs/rapids/conda-bld/libcuml_1606459702665/work/cpp/test/sg/fil_test.cu:363: Failure
09:38:27 Value of: raft::devArrMatch(want_preds_d, preds_d, ps.num_rows, raft::CompareApprox<float>(tolerance), stream)
09:38:27   Actual: false (actual=2 != expected=0 @19182)
09:38:27 Expected: true
09:38:27 [  FAILED  ] FilTests/PredictSparse16FilTest.Predict/15, where GetParam() = num_rows = 20000, num_cols = 50, nan_prob = 0.05, depth = 8, num_trees = 60, leaf_prob = 0.05, output = RAW, threshold = 0, blocks_per_sm = 0, algo = 1, seed = 42, tolerance = 0.002, op = <, global_bias = 0.5, leaf_algo = 2, num_classes = 6 (239 ms)

Tests FilTests/PredictSparse16FilTest.Predict/15 and FilTests/PredictSparse16FilTest.Predict/17 were reported as failing.

Additional information

FIL test temporarily disabled here: 5b64e25

@tfeher tfeher added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 30, 2020
@canonizer
Copy link
Contributor

A similar problem is described in #3205. I'm closing this bug, and will track further work on this problem in #3205.

@tfeher
Copy link
Contributor Author

tfeher commented Dec 1, 2020

Closing this as duplicate of #3205.

@tfeher tfeher closed this as completed Dec 1, 2020
rapids-bot bot pushed a commit that referenced this issue Dec 1, 2020
Added a missing `__syncthreads()`.

- also re-enabled Sparse16 FIL tests
- this should fix #3205 and #3206

Authors:
  - Andy Adinets <[email protected]>
  - John Zedlewski <[email protected]>
  - Dante Gama Dessavre <[email protected]>

Approvers:
  - Thejaswi Rao
  - null

URL: #3215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants