Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make some computations in DFTK GPU-compatible #712

Merged
merged 98 commits into from
Nov 22, 2022

Conversation

GVigne
Copy link
Contributor

@GVigne GVigne commented Aug 24, 2022

This PR is a followup of this one, which implements GPU compatibility for LOBPCG. If you have any questions/remarks as to how LOBPCG works, please refer to this other PR.

The goal of the following PR is to implement GPU compatibility for some computations made by DFTK. This mainly means modifying the PlaneWaveBasis so it can store GPUArrays, and extending the apply! functions to allow the Hamiltonian and its operators to be applied to GPUArrays.

From an end user perspective, the only thing that changes is when he builds the basis. There is now an optional argument array_type which tells the code which type of array structure should be used. For example :

basis = PlaneWaveBasis(model; Ecut=30, kgrid=(1, 1, 1)) # Computations will happen on CPU
basis_gpu = PlaneWaveBasis(model; Ecut=30, kgrid=(1, 1, 1), array_type = CuArray) #Computations will happen on GPU using CUDA

The end-user can then call the SCF with either basis or basis_gpu.

I used CUDA since I have an NVIDIA GPU, but this part of the code should also work with other GPUs, since I did not use any CUDA-specific function.

Things that I already know could be greatly improved:

  • The preconditionners. For now they work, but two things could be done. The first one is offload mean_kin to the GPU. That would require some work, as it means we would have to rewrite ldiv! and mul! (which right now does a lot of scalar indexing). The other thing would be to build kin directly on the GPU instead of building it on CPU then offloading it (which is currently being done). In order to do this, we would need Gplusk_vectors_cart to return a GPUArray: this means that G_vectors(basis, kpoint) should return a GPUArray, ie that kpt.G_vectors should be on GPU. And this is going to be quite hard, as it means that we would have to rewrite every function calling G_vectors(basis, kpoint) to be GPU-compatible.
  • The solvers. I didn't manage to make the NLSolve solvers work, so I had to use the ones implementend in DFTK. scf_damping_solver works fine, but not scf_anderson_solver as I didn't manage to write it in a GPU-compatible way.
  • The terms. I have implemented the "easy" terms (Kinetic, AtomicLocal, AtomicNonlocal and Hartree), but we could add the Magnetic, PairWisePotential, Anyonic and XC terms. These terms will be much harder to implement, as they either vastly use scalar indexing or rely on other librairies (libxc).

Edit: Two big changes:

  • The G_vectors for each kpoint and the occupation have been offloaded to the GPU. However, to do this, some functions (like compute_density) have to bring those arrays back on the CPU, as they do scalar indexing. This could be improved if it is performance-critical.
  • The Anderson solver will work in CUDA once this bug has been solved. We will only have to allow scalar indexing on βs, which should really be fine as it isn't a big vector.

GVigne and others added 29 commits July 6, 2022 09:30
…h no SCF solver (solver=scf_damping_solver(1.0)) and just one Kinetic term.
@GVigne GVigne marked this pull request as ready for review September 7, 2022 10:21
Copy link
Member

@mfherbst mfherbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits

examples/gpu.jl Outdated Show resolved Hide resolved
src/common/norm.jl Outdated Show resolved Hide resolved
test/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/densities.jl Outdated Show resolved Hide resolved
src/occupation.jl Outdated Show resolved Hide resolved
src/scf/nbands_algorithm.jl Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/Model.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved
src/terms/hartree.jl Outdated Show resolved Hide resolved
src/terms/hartree.jl Outdated Show resolved Hide resolved
src/terms/xc.jl Outdated Show resolved Hide resolved
src/terms/xc.jl Outdated Show resolved Hide resolved
src/terms/xc.jl Outdated Show resolved Hide resolved
Copy link
Member

@antoine-levitt antoine-levitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor nits and good to merge! I like the to_cpu and to_device!

@mfherbst mfherbst enabled auto-merge (squash) November 22, 2022 13:29
@mfherbst mfherbst enabled auto-merge (squash) November 22, 2022 13:45
@mfherbst mfherbst merged commit 9259c96 into JuliaMolSim:master Nov 22, 2022
@vchuravy
Copy link
Collaborator

Congrats! This is great :)

@GVigne
Copy link
Contributor Author

GVigne commented Nov 23, 2022

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants