Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: Explore if OpenDAL can support KvikIO (aka Nvidia GPUDirect Storage) #5090

Open
1 task
Xuanwo opened this issue Sep 3, 2024 · 3 comments
Open
1 task
Labels
enhancement New feature or request

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Sep 3, 2024

Feature Description

GPUDirect Storage

Nvidia GPUDirect Storage is a technology introduced by Nvidia that enables GPUs to read files directly.

image

With GPUDirect Storage:

  • Higher Bandwidth: GDS offers 2x-8x higher bandwidth by enabling direct data transfers between storage and GPU, bypassing CPU involvement.
  • Lower Latency: Explicit and direct transfers reduce latency by 3.8x and ensure stable latency even as GPU concurrency increases.
  • Reduced CPU Load: DMA engines near storage reduce CPU load, maintaining higher bandwidth with minimal CPU utilization, and GPU utilization remains near zero during data transfers.
  • Superior IO Bandwidth: GPUs achieve higher IO bandwidth (215 GB/s) compared to CPUs (50 GB/s).
  • Scalable Storage Access: GDS allows fast access to large data sets, whether stored locally or remotely, nearly saturating GPU bandwidth by combining data transfers from various sources (CPU memory, local storage, remote storage).
  • Efficient Data Caching: Large distributed data sets can be cached in local storage, and working tables can be cached in CPU memory for collaborative use with the CPU, enhancing overall system performance.

cuFile

Nvidia exposes GDS support via cuFile which included in CUDA toolkit.

KvikIO

KvikIO is a High Performance File IO lib.

KvikIO is a Python and C++ library for high performance file IO. It provides C++ and Python bindings to cuFile, which enables GPUDirect Storage (GDS). KvikIO also works efficiently when GDS isn't available and can read/write both host and device data seamlessly. The C++ library is header-only making it easy to include in existing projects.

Problem and Solution

Perhaps OpenDAL could implement a service based on kvikio, allowing our users to interact with GDS using the familiar OpenDAL API.

kvikio currently offers only C++ and Python APIs. Those interested should first create a Rust binding for kvikio. To explore the performance of GDS, one should also have the CUDA toolkit and a compatible graphics card.

Additional Context

No response

Are you willing to contribute to the development of this feature?

  • Yes, I am willing to contribute to the development of this feature.
@Xuanwo Xuanwo added the enhancement New feature or request label Sep 3, 2024
@yuchanns
Copy link
Member

yuchanns commented Sep 3, 2024

Interesting! What a shame I don't have a Nvidia GPU.

@Xuanwo
Copy link
Member Author

Xuanwo commented Sep 3, 2024

Interesting! What a shame I don't have a Nvidia GPU.

AMD supports similar technology called DirectGMA, but I'm not sure if it can read files.

@Xuanwo Xuanwo changed the title new feature: Expore if OpenDAL can support KvikIO (aka Nvidia GPUDirect Storage) new feature: Explore if OpenDAL can support KvikIO (aka Nvidia GPUDirect Storage) Sep 3, 2024
@morristai
Copy link
Member

This is definitely an ambitious idea, and I would like to explore it further. However, after some research, I found that even if we create a Rust binding for KvikIO, it might not be sufficient.

The reason is that to utilize GDS, we need to use the CUDA API to manage GPU memory. Unfortunately, there is not yet a mature Rust library for CUDA. In KvikIO’s examples, to create a GPU memory buffer for KvikIO to read/write, they use APIs from cuda_runtime_api.h in C++ and CuPy in Python. These APIs are fully supported by Nvidia, and creating a Rust binding for them would be challenging.

If we use KvikIO without CUDA, as demonstrated in the example in the KvikIO repo (cpp/examples/basic_no_cuda.cpp), it won't use GPU memory at all. See the discussion in the issue for more details.

We could still try some third-party libraries like cudarc; maybe it will be sufficient for our use case. Perhaps others might have better ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants