Make some computations in DFTK GPU-compatible #712

GVigne · 2022-08-24T07:39:00Z

This PR is a followup of this one, which implements GPU compatibility for LOBPCG. If you have any questions/remarks as to how LOBPCG works, please refer to this other PR.

The goal of the following PR is to implement GPU compatibility for some computations made by DFTK. This mainly means modifying the PlaneWaveBasis so it can store GPUArrays, and extending the apply! functions to allow the Hamiltonian and its operators to be applied to GPUArrays.

From an end user perspective, the only thing that changes is when he builds the basis. There is now an optional argument array_type which tells the code which type of array structure should be used. For example :

basis = PlaneWaveBasis(model; Ecut=30, kgrid=(1, 1, 1)) # Computations will happen on CPU
basis_gpu = PlaneWaveBasis(model; Ecut=30, kgrid=(1, 1, 1), array_type = CuArray) #Computations will happen on GPU using CUDA

The end-user can then call the SCF with either basis or basis_gpu.

I used CUDA since I have an NVIDIA GPU, but this part of the code should also work with other GPUs, since I did not use any CUDA-specific function.

Things that I already know could be greatly improved:

The preconditionners. For now they work, but two things could be done. The first one is offload mean_kin to the GPU. That would require some work, as it means we would have to rewrite ldiv! and mul! (which right now does a lot of scalar indexing). The other thing would be to build kin directly on the GPU instead of building it on CPU then offloading it (which is currently being done). In order to do this, we would need Gplusk_vectors_cart to return a GPUArray: this means that G_vectors(basis, kpoint) should return a GPUArray, ie that kpt.G_vectors should be on GPU. And this is going to be quite hard, as it means that we would have to rewrite every function calling G_vectors(basis, kpoint) to be GPU-compatible.
The solvers. I didn't manage to make the NLSolve solvers work, so I had to use the ones implementend in DFTK. scf_damping_solver works fine, but not scf_anderson_solver as I didn't manage to write it in a GPU-compatible way.
The terms. I have implemented the "easy" terms (Kinetic, AtomicLocal, AtomicNonlocal and Hartree), but we could add the Magnetic, PairWisePotential, Anyonic and XC terms. These terms will be much harder to implement, as they either vastly use scalar indexing or rely on other librairies (libxc).

Edit: Two big changes:

The G_vectors for each kpoint and the occupation have been offloaded to the GPU. However, to do this, some functions (like compute_density) have to bring those arrays back on the CPU, as they do scalar indexing. This could be improved if it is performance-critical.
The Anderson solver will work in CUDA once this bug has been solved. We will only have to allow scalar indexing on βs, which should really be fine as it isn't a big vector.

…h no SCF solver (solver=scf_damping_solver(1.0)) and just one Kinetic term.

…lity in LOBPCG

mfherbst

Small nits

examples/gpu.jl

src/common/norm.jl

test/PlaneWaveBasis.jl

src/densities.jl

src/occupation.jl

src/scf/nbands_algorithm.jl

src/PlaneWaveBasis.jl

…nergy

src/Model.jl

src/PlaneWaveBasis.jl

src/terms/hartree.jl

src/terms/xc.jl

antoine-levitt

LGTM, minor nits and good to merge! I like the to_cpu and to_device!

src/scf/mixing.jl

src/PlaneWaveBasis.jl

vchuravy · 2022-11-22T15:13:46Z

Congrats! This is great :)

GVigne · 2022-11-23T08:05:48Z

Thanks a lot!

GVigne and others added 29 commits July 6, 2022 09:30

LOBPCG with GPU support (CUDA). Does not yet support preconditionning

e17fb59

Merge branch 'master' into gpu_hpc

e80f5b6

MWE for self_consistent_field with GPU support (CUDA). Only works wit…

ed15b32

…h no SCF solver (solver=scf_damping_solver(1.0)) and just one Kinetic term.

Fix package version conflicts while merging

19bfa69

Stop using BlockArrays and use a custom BlockVector for GPU compatibi…

f4748ac

…lity in LOBPCG

GPU support for AtomicLocal term

94f1d2a

First GPU implementation of the non local term + LOBPCG enhancement

60d8041

Merge branch 'master' into gpu_hpc

fb6484a

add timed examples

cf1dc3c

Change some code organisation after PR's feedback

11b85f0

Code organisation and performance optimisation after PR's feedback

abb99f4

Code refactoring following PR's feedback

a89171a

PWB is now parametric on the array type: this also fixes type issues

44bcb61

Update workarounds: remove iszero and isone, add eigen

646b44c

Rename block_mul into * + build e on GPU

76c697d

Modify the change of basis functions to be GPU compatible

bd684d7

Merge branch 'master' into gpu_hpc

f02c954

Keep this branch synced with LOBPCG_GPU

15d1324

Add the Hartree term

62d9f79

Remove CUDA dependency from ortho_qr

19100cf

Bugfix when plotting bandstructure + typo fixes

1184ec1

Make all mixings except Chi0 mixing GPu compatible

833928b

Prettier way to overload eigen for CuArrays

e12f35b

Update comments + remove unnecessary code

a0c4066

Update the GPU example

9cdff93

Put the Gvectors for each kpoint on the GPU

20b7b10

Bugfix after launching the tests

a2d811b

Put the occupation on GPU

8ee55a4

Merge branch 'master' into gpu_hpc

7909720

GVigne marked this pull request as ready for review September 7, 2022 10:21

GVigne and others added 4 commits November 17, 2022 14:58

Remove convert_like and add to_device

349b8f1

Update comments and docstring

778712d

Enforce the use of to_cpu instead of Array when doing GPU-CPU transfers

c39ee19

Small polishing

4501bd9

mfherbst reviewed Nov 18, 2022

View reviewed changes

GVigne added 3 commits November 21, 2022 09:51

Formatting, updating comments and small nits

60506b9

Remove architecture argument from G_vectors + infer type in kinetic_e…

147c81d

…nergy

Merge branch 'gpu_hpc' of github.com:GVigne/DFTK.jl into gpu_hpc

da1ac09

mfherbst reviewed Nov 21, 2022

View reviewed changes

src/Model.jl Outdated Show resolved Hide resolved

GVigne and others added 3 commits November 21, 2022 14:31

Inline _closure_matmatmul

8a40af5

Rename closures and small changes to ortho_qr and build_kpoints

5ee546b

Hard-code CPU architecture

b7b611a

antoine-levitt reviewed Nov 22, 2022

View reviewed changes

src/terms/xc.jl Outdated Show resolved Hide resolved

antoine-levitt reviewed Nov 22, 2022

View reviewed changes

mfherbst reviewed Nov 22, 2022

View reviewed changes

src/scf/mixing.jl Show resolved Hide resolved

antoine-levitt reviewed Nov 22, 2022

View reviewed changes

src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved

Merge branch 'master' into gpu_hpc

fbe675d

mfherbst enabled auto-merge (squash) November 22, 2022 13:29

mfherbst disabled auto-merge November 22, 2022 13:29

mfherbst added 2 commits November 22, 2022 14:42

Reformat basis

d7be3a6

Docs concurrency

482d3b2

mfherbst enabled auto-merge (squash) November 22, 2022 13:45

mfherbst and others added 4 commits November 22, 2022 14:47

Value type instead of real type.

7070487

Minor nits: broadcast norm2, more type inference

cf3fbf8

Merge branch 'gpu_hpc' of github.com:GVigne/DFTK.jl into gpu_hpc

dec0653

Update xc.jl

ff0d9b0

mfherbst merged commit 9259c96 into JuliaMolSim:master Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make some computations in DFTK GPU-compatible #712

Make some computations in DFTK GPU-compatible #712

GVigne commented Aug 24, 2022 •

edited

Loading

mfherbst left a comment

antoine-levitt left a comment

vchuravy commented Nov 22, 2022

GVigne commented Nov 23, 2022

Make some computations in DFTK GPU-compatible #712

Make some computations in DFTK GPU-compatible #712

Conversation

GVigne commented Aug 24, 2022 • edited Loading

mfherbst left a comment

Choose a reason for hiding this comment

antoine-levitt left a comment

Choose a reason for hiding this comment

vchuravy commented Nov 22, 2022

GVigne commented Nov 23, 2022

GVigne commented Aug 24, 2022 •

edited

Loading