[BUG] Slowdown in constructing a cudf dataframe from a numba device array #16434

isVoid · 2024-07-30T14:39:52Z

Describe the bug
Today, if we construct a cudf dataframe from a large numba device array, the construction can be slow.

Steps/Code to reproduce bug

cupy_array = cupy.ones((10_000, 100))
cudf.DataFrame(cupy_array) # fast
cudf.DataFrame(numba.cuda.to_device(cupy_array)) # slow

Expected behavior
At one point, constructing from a numba device array was fast. It should be almost as fast as constructing a cupy array since both supports CAI.

Environment overview (please complete the following information)

Environment location: [Bare-metal]
Method of cuDF install: [conda]

wence- · 2024-07-30T14:45:45Z

Concretely:

import cudf
import cupy
import numba.cuda

N = 10_000

ones = cupy.ones((N, 100))
n_ones = numba.cuda.to_device(ones)

%time cudf.DataFrame(ones);

%time cudf.DataFrame(n_ones);

CPU times: user 10.4 ms, sys: 0 ns, total: 10.4 ms
Wall time: 10.4 ms
CPU times: user 837 ms, sys: 0 ns, total: 837 ms
Wall time: 837 ms

If we increase N to 100_000:

import cudf
import cupy
import numba.cuda

N = 100_000

ones = cupy.ones((N, 100))
n_ones = numba.cuda.to_device(ones)

%time cudf.DataFrame(ones);

%time cudf.DataFrame(n_ones);
CPU times: user 15.7 ms, sys: 0 ns, total: 15.7 ms
Wall time: 15.7 ms
CPU times: user 7.2 s, sys: 240 ms, total: 7.44 s
Wall time: 7.44 s

It looks like slicing a numba device array if the result is not C or F contiguous produces code that is linear in the non-sliced axis.

If we are F-contiguous then things are fine:

import cudf
import cupy
import numba.cuda

N = 10_000

ones = cupy.ones((100, N)).T
n_ones = numba.cuda.to_device(ones)

%time cudf.DataFrame(ones);

%time cudf.DataFrame(n_ones);
CPU times: user 3.27 ms, sys: 0 ns, total: 3.27 ms
Wall time: 3.28 ms
CPU times: user 11.6 ms, sys: 0 ns, total: 11.6 ms
Wall time: 11.6 ms

isVoid added the bug Something isn't working label Jul 30, 2024

github-project-automation bot added this to cuDF/Dask/Numba/UCX Jul 30, 2024

github-project-automation bot moved this to In Progress in cuDF/Dask/Numba/UCX Jul 30, 2024

mroeschke mentioned this issue Jul 30, 2024

Ensure objects with __interface__ are converted to cupy/numpy arrays #16436

Merged

3 tasks

GPUtester closed this as completed in 445a75f Aug 1, 2024

github-project-automation bot moved this from In Progress to Done in cuDF/Dask/Numba/UCX Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Slowdown in constructing a cudf dataframe from a numba device array #16434

[BUG] Slowdown in constructing a cudf dataframe from a numba device array #16434

isVoid commented Jul 30, 2024

wence- commented Jul 30, 2024

[BUG] Slowdown in constructing a cudf dataframe from a numba device array #16434

[BUG] Slowdown in constructing a cudf dataframe from a numba device array #16434

Comments

isVoid commented Jul 30, 2024

wence- commented Jul 30, 2024