Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow SCF in comparison with QE on CALYPSO generated random structures. #3447

Open
9 tasks
Carrotkingdom opened this issue Jan 5, 2024 · 13 comments
Open
9 tasks
Assignees
Labels
Diago Issues related to diagonalizaiton methods

Comments

@Carrotkingdom
Copy link

Details

I generated 12 structures (each in a different chemical composition) using CALYPSO (a structure-search software).
The overall average performance against QE is nearly 60% slower.
1336c78514d77ff9d9a52197b8b6c29

Please check the test result and files here: (Author Xingliang Peng)
https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.74-7cf445
Machine: 64 core 256G memory CPU, on paratera plaform.
Number of Atoms: 20 (or 21 for PtAuAg)

Note that these structures are randomly generated. They are not 100% physically reasonable. But all tasks converge successfully. The structures may have very large pressures. These cases have a large number of k points

Task list for Issue attackers (only for developers)

  • Reproduce the performance issue on a similar system or environment.
  • Identify the specific section of the code causing the performance issue.
  • Investigate the issue and determine the root cause.
  • Research best practices and potential solutions for the identified performance issue.
  • Implement the chosen solution to address the performance issue.
  • Test the implemented solution to ensure it improves performance without introducing new issues.
  • Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
  • Review and incorporate any relevant feedback from users or developers.
  • Merge the improved solution into the main codebase and notify the issue reporter.
@Carrotkingdom Carrotkingdom added the Performance Issues related to fail running ABACUS label Jan 5, 2024
@WHUweiqingzhou
Copy link
Collaborator

WHUweiqingzhou commented Feb 4, 2024

@haozhihan,
could you share more news about this issue here. I believe many developers and users are quite curious about the progress.

@WHUweiqingzhou
Copy link
Collaborator

Update:

  1. This issue is originated from the fact that the dav method is slower than the one of QE. We are working on it.
  2. Based on test https://deepmodeling-activity.github.io/abacus-test.github.io/index.html?pname=dpa/dcu, we find the cost of DCU is much cheaper than CPU version. So we suggest the user to use DCU version now.

@haozhihan
Copy link
Collaborator

The code related to the new Davidson method has been merged into the current main repository code of abacus.

The new Davidson method is called dav_subspace, which has a significant performance improvement compared to previous Davidson methods.

And even in many cases, it has a faster calculation speed than QE's Davidson method.

@Carrotkingdom

@haozhihan
Copy link
Collaborator

https://xmywuqhxb0.feishu.cn/docx/GufhdWAm2oVGrzxYvsMco5LenTe

@WHUweiqingzhou
Copy link
Collaborator

@pxlxingliang could you make some test using dav_subspace method to check the efficiency.

@pxlxingliang
Copy link
Collaborator

I have tested one alloy case, and it seems that the new performance of dav_subspace is worse than old dav method.

                      scf_steps  scf_time  scf_time/step
dav/LiLaH_8                  11    227.39      20.671818
dav-subspace/LiLaH_8         16    587.91      36.744375

The detail of SCF:
dav

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -1.325515e+03  0.000000e+00   6.759e-01  6.460e+01
 DA2    -1.326614e+03  -1.098380e+00  4.145e-01  2.133e+01
 DA3    -1.327599e+03  -9.847214e-01  3.253e-01  1.348e+01
 DA4    -1.328066e+03  -4.677166e-01  5.503e-03  1.235e+01
 DA5    -1.328087e+03  -2.095905e-02  9.647e-05  2.294e+01
 DA6    -1.328088e+03  -3.460400e-04  7.558e-05  1.963e+01
 DA7    -1.328088e+03  -1.104796e-04  1.044e-06  1.221e+01
 DA8    -1.328088e+03  -3.610688e-06  1.839e-07  1.960e+01
 DA9    -1.328088e+03  -4.773487e-07  6.780e-08  1.424e+01
 DA10   -1.328088e+03  -1.130312e-07  1.025e-08  1.179e+01
 DA11   -1.328088e+03  -1.942902e-08  3.093e-10  1.522e+01

dav-subspace

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DS1    -1.319801e+03  0.000000e+00   6.081e-01  1.296e+02
 DS2    -1.321970e+03  -2.168314e+00  2.709e-01  1.146e+01
 DS3    -1.324139e+03  -2.169000e+00  2.677e-01  1.278e+01
 DS4    -1.325194e+03  -1.054989e+00  2.845e-03  1.149e+01
 DS5    -1.328076e+03  -2.882602e+00  4.846e-03  2.595e+02
 DS6    -1.328064e+03  1.193250e-02   5.176e-03  2.206e+01
 DS7    -1.328068e+03  -3.891617e-03  8.619e-04  1.171e+01
 DS8    -1.328076e+03  -7.617757e-03  3.308e-05  2.322e+01
 DS9    -1.328080e+03  -4.526490e-03  2.527e-05  2.424e+01
 DS10   -1.328082e+03  -1.501581e-03  1.944e-05  1.244e+01
 DS11   -1.328083e+03  -9.081024e-04  2.174e-05  1.167e+01
 DS12   -1.328084e+03  -8.042899e-04  1.180e-06  1.157e+01
 DS13   -1.328084e+03  -6.529929e-04  8.501e-07  1.162e+01
 DS14   -1.328085e+03  -5.592638e-04  5.921e-08  1.159e+01
 DS15   -1.328085e+03  -4.761198e-04  2.183e-08  1.147e+01
 DS16   -1.328086e+03  -4.011135e-04  9.338e-09  1.149e+01

@pxlxingliang
Copy link
Collaborator

pxlxingliang commented Apr 16, 2024

ndav.zip
Here are the outputs of the test.

@haozhihan
Copy link
Collaborator

This is my test result, by adjusting the PW-DIAG-NDIM parameter to the same value as the old dav.

dav_subspace:

                                                                                     
                              ABACUS v3.6.1

               Atomic-orbital Based Ab-initio Computation at UStc                    

                     Website: http://abacus.ustc.edu.cn/                             
               Documentation: https://abacus.deepmodeling.com/                       
                  Repository: https://github.com/abacusmodeling/abacus-develop       
                              https://github.com/deepmodeling/abacus-develop         
                      Commit: 43cde6d9e (Sun Apr 14 14:53:29 2024 +0800)

 Tue Apr 16 19:18:26 2024
 MAKE THE DIR         : OUT.ABACUS/
 RUNNING WITH DEVICE  : CPU / Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Warning: the number of valence electrons in pseudopotential > 1 for Li: [He] 2s1
 Warning: the number of valence electrons in pseudopotential > 3 for La: [Xe] 5d1 6s2
 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient.
 If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 UNIFORM GRID DIM        : 45 * 45 * 96
 UNIFORM GRID DIM(BIG)   : 45 * 45 * 96
 DONE(0.0414046  SEC) : SETUP UNITCELL
 DONE(0.0863787  SEC) : INIT K-POINTS
 ---------------------------------------------------------
 Self-consistent calculations for electrons
 ---------------------------------------------------------
 SPIN    KPOINTS         PROCESSORS  
 1       592             16          
 ---------------------------------------------------------
 Use plane wave basis
 ---------------------------------------------------------
 ELEMENT NATOM       XC          
 Li      2           
 La      1           
 H       5           
 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(0.141307   SEC) : INIT PLANEWAVE
 MEMORY FOR PSI (MB)  : 127.477
 DONE(0.263062   SEC) : LOCAL POTENTIAL
 DONE(0.420346   SEC) : NON-LOCAL POTENTIAL
 DONE(0.420392   SEC) : INIT BASIS
 -------------------------------------------
 SELF-CONSISTENT : 
 -------------------------------------------
 START CHARGE      : atomic
 DONE(0.658796   SEC) : INIT SCF
 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)    
 DS1    -1.325148e+03  0.000000e+00   6.714e-01  1.269e+02  
 DS2    -1.326307e+03  -1.159468e+00  3.099e-01  1.834e+01  
 DS3    -1.327350e+03  -1.042559e+00  2.857e-01  2.049e+01  
 DS4    -1.327867e+03  -5.175793e-01  8.317e-03  1.842e+01  
 DS5    -1.328083e+03  -2.152396e-01  2.184e-03  6.982e+01  
 DS6    -1.328083e+03  -6.010596e-04  7.426e-04  2.428e+01  
 DS7    -1.328086e+03  -3.213118e-03  2.674e-05  2.745e+01  
 DS8    -1.328087e+03  -8.680951e-04  8.392e-06  2.283e+01  
 DS9    -1.328087e+03  -2.311163e-04  1.420e-06  1.808e+01  
 DS10   -1.328088e+03  -1.017140e-04  2.372e-07  1.807e+01  
 DS11   -1.328088e+03  -5.239065e-05  2.240e-08  1.823e+01  
 DS12   -1.328088e+03  -3.283545e-05  5.809e-09  1.836e+01  
----------------------------------------------------------------
TOTAL-STRESS (KBAR)                                           
----------------------------------------------------------------
      265.1634021231       -31.0062037473        60.8166564208
      -31.0062037473       229.3627558415       105.3388149814
       60.8166564208       105.3388149814       195.5904956706
----------------------------------------------------------------
 TOTAL-PRESSURE: 230.038885 KBAR

TIME STATISTICS
-------------------------------------------------------------------------------------
     CLASS_NAME                 NAME             TIME(Sec)  CALLS   AVG(Sec) PER(%)
-------------------------------------------------------------------------------------
                     total                       421.24          15  28.08   100.00
Driver               reading                       0.01           1   0.01     0.00
Input                Init                          0.00           1   0.00     0.00
Input_Conv           Convert                       0.00           1   0.00     0.00
Driver               driver_line                 421.22           1 421.22   100.00
UnitCell             check_tau                     0.00           1   0.00     0.00
PW_Basis_Sup         setuptransform                0.01           1   0.01     0.00
PW_Basis_Sup         distributeg                   0.00           1   0.00     0.00
mymath               heapsort                      0.00           3   0.00     0.00
PW_Basis_K           setuptransform                0.02           1   0.02     0.01
PW_Basis_K           distributeg                   0.00           1   0.00     0.00
PW_Basis             setup_struc_factor            0.00           1   0.00     0.00
ppcell_vnl           init                          0.01           1   0.01     0.00
ppcell_vl            init_vloc                     0.11           1   0.11     0.03
ppcell_vnl           init_vnl                      0.16           1   0.16     0.04
WF_atomic            init_at_1                     0.00           1   0.00     0.00
wavefunc             wfcinit                       0.00           1   0.00     0.00
Ions                 opt_ions                    420.76           1 420.76    99.89
ESolver_KS_PW        run                         401.55           1 401.55    95.33
H_Ewald_pw           compute_ewald                 0.05           1   0.05     0.01
Charge               set_rho_core                  0.00           1   0.00     0.00
Charge               atomic_rho                    0.17           1   0.17     0.04
PW_Basis_Sup         recip2real                    0.10          81   0.00     0.02
PW_Basis_Sup         gathers_scatterp              0.08          81   0.00     0.02
Potential            init_pot                      0.01           1   0.01     0.00
Potential            update_from_charge            0.15          13   0.01     0.04
Potential            cal_fixed_v                   0.00           1   0.00     0.00
PotLocal             cal_fixed_v                   0.00           1   0.00     0.00
Potential            cal_v_eff                     0.15          13   0.01     0.04
H_Hartree_pw         v_hartree                     0.01          13   0.00     0.00
PW_Basis_Sup         real2recip                    0.06         108   0.00     0.01
PW_Basis_Sup         gatherp_scatters              0.04         108   0.00     0.01
PotXC                cal_v_eff                     0.13          13   0.01     0.03
XC_Functional        v_xc                          0.13          13   0.01     0.03
Potential            interpolate_vrs               0.00          13   0.00     0.00
Charge_Mixing        init_mixing                   0.00           1   0.00     0.00
ESolver_KS_PW        hamilt2density              401.09          12  33.42    95.22
HSolverPW            solve                       401.09          12  33.42    95.22
Nonlocal             getvnl                        8.73        7104   0.00     2.07
pp_cell_vnl          getvnl                       10.17        8288   0.00     2.41
Structure_Factor     get_sk                        1.97       27232   0.00     0.47
DiagoIterAssist      diagH_subspace                6.36         592   0.01     1.51
Operator             hPsi                        250.68       40963   0.01    59.51
Operator             EkineticPW                    2.25       40963   0.00     0.53
Operator             VeffPW                      218.10       40963   0.01    51.78
PW_Basis_K           recip2real                  125.91      640657   0.00    29.89
PW_Basis_K           gathers_scatterp             82.86      640657   0.00    19.67
PW_Basis_K           real2recip                  112.66      491473   0.00    26.75
PW_Basis_K           gatherp_scatters             82.21      491473   0.00    19.52
Operator             NonlocalPW                   30.19       40963   0.00     7.17
Nonlocal             add_nonlocal_pp               8.97       40963   0.00     2.13
DiagoIterAssist      diagH_LAPACK                  0.16         592   0.00     0.04
Diago_DavSubspace    diag_once                   330.99        7104   0.05    78.57
Diago_DavSubspace    first                        83.42        7104   0.01    19.80
Diago_DavSubspace    cal_elem                     19.90       40371   0.00     4.72
Diago_DavSubspace    diag_zhegvx                  45.74       40371   0.00    10.86
Diago_DavSubspace    cal_grad                    181.18       33267   0.01    43.01
Diago_DavSubspace    check_update                  0.03       33267   0.00     0.01
Diago_DavSubspace    last                          3.94       10338   0.00     0.93
Diago_DavSubspace    refresh                       1.38        3234   0.00     0.33
ElecStatePW          psiToRho                     33.01          12   2.75     7.84
Charge_Mixing        get_drho                      0.01          12   0.00     0.00
Charge_Mixing        inner_product_recip_rho       0.00          12   0.00     0.00
Charge               mix_rho                       0.02          11   0.00     0.00
Charge               Broyden_mixing                0.01          11   0.00     0.00
Charge_Mixing        inner_product_recip_hartree   0.01         104   0.00     0.00
Forces               cal_force_loc                 0.00           1   0.00     0.00
Forces               cal_force_ew                  0.00           1   0.00     0.00
Forces               cal_force_nl                  3.02           1   3.02     0.72
Forces               cal_force_cc                  0.00           1   0.00     0.00
Forces               cal_force_scc                 0.18           1   0.18     0.04
Stress_PW            cal_stress                   16.01           1  16.01     3.80
Stress_Func          stress_kin                    0.21           1   0.21     0.05
Stress_Func          stress_har                    0.00           1   0.00     0.00
Stress_Func          stress_ewa                    0.00           1   0.00     0.00
Stress_Func          stress_gga                    0.01           1   0.01     0.00
Stress_Func          stress_loc                    0.46           1   0.46     0.11
Stress_Func          stress_cc                     0.00           1   0.00     0.00
Stress_Func          stress_nl                    15.32           1  15.32     3.64
ModuleIO             write_istate_info             0.05           1   0.05     0.01
-------------------------------------------------------------------------------------

 START  Time  : Tue Apr 16 19:18:26 2024
 FINISH Time  : Tue Apr 16 19:25:27 2024
 TOTAL  Time  : 421
 SEE INFORMATION IN : OUT.ABACUS/

dav:

                                                                                     
                              ABACUS v3.6.1

               Atomic-orbital Based Ab-initio Computation at UStc                    

                     Website: http://abacus.ustc.edu.cn/                             
               Documentation: https://abacus.deepmodeling.com/                       
                  Repository: https://github.com/abacusmodeling/abacus-develop       
                              https://github.com/deepmodeling/abacus-develop         
                      Commit: 43cde6d9e (Sun Apr 14 14:53:29 2024 +0800)

 Tue Apr 16 18:56:05 2024
 MAKE THE DIR         : OUT.ABACUS/
 RUNNING WITH DEVICE  : CPU / Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Warning: the number of valence electrons in pseudopotential > 1 for Li: [He] 2s1
 Warning: the number of valence electrons in pseudopotential > 3 for La: [Xe] 5d1 6s2
 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient.
 If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 UNIFORM GRID DIM        : 45 * 45 * 96
 UNIFORM GRID DIM(BIG)   : 45 * 45 * 96
 DONE(0.0874296  SEC) : SETUP UNITCELL
 DONE(0.128312   SEC) : INIT K-POINTS
 ---------------------------------------------------------
 Self-consistent calculations for electrons
 ---------------------------------------------------------
 SPIN    KPOINTS         PROCESSORS  
 1       592             16          
 ---------------------------------------------------------
 Use plane wave basis
 ---------------------------------------------------------
 ELEMENT NATOM       XC          
 Li      2           
 La      1           
 H       5           
 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(0.181191   SEC) : INIT PLANEWAVE
 MEMORY FOR PSI (MB)  : 127.477
 DONE(0.303691   SEC) : LOCAL POTENTIAL
 DONE(0.463623   SEC) : NON-LOCAL POTENTIAL
 DONE(0.463668   SEC) : INIT BASIS
 -------------------------------------------
 SELF-CONSISTENT : 
 -------------------------------------------
 START CHARGE      : atomic
 DONE(0.702524   SEC) : INIT SCF
 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)    
 DA1    -1.325515e+03  0.000000e+00   6.759e-01  1.112e+02  
 DA2    -1.326614e+03  -1.098380e+00  4.145e-01  4.754e+01  
 DA3    -1.327599e+03  -9.847214e-01  3.253e-01  2.952e+01  
 DA4    -1.328066e+03  -4.677166e-01  5.503e-03  2.752e+01  
 DA5    -1.328087e+03  -2.095905e-02  9.647e-05  5.057e+01  
 DA6    -1.328088e+03  -3.460438e-04  7.557e-05  4.132e+01  
 DA7    -1.328088e+03  -1.104754e-04  1.044e-06  2.616e+01  
 DA8    -1.328088e+03  -3.610345e-06  1.834e-07  4.222e+01  
 DA9    -1.328088e+03  -4.759293e-07  6.776e-08  3.105e+01  
 DA10   -1.328088e+03  -1.130938e-07  1.026e-08  7.592e+01  
 DA11   -1.328088e+03  -1.948084e-08  3.094e-10  3.199e+01  
----------------------------------------------------------------
TOTAL-STRESS (KBAR)                                           
----------------------------------------------------------------
      265.3484013815       -31.0439570392        60.8170545107
      -31.0439570392       229.5075403446       105.3378429642
       60.8170545107       105.3378429642       195.6297038382
----------------------------------------------------------------
 TOTAL-PRESSURE: 230.161882 KBAR

TIME STATISTICS
-------------------------------------------------------------------------------------
     CLASS_NAME                 NAME             TIME(Sec)  CALLS   AVG(Sec) PER(%)
-------------------------------------------------------------------------------------
                     total                       535.57          15  35.70   100.00
Driver               reading                       0.03           1   0.03     0.01
Input                Init                          0.02           1   0.02     0.00
Input_Conv           Convert                       0.00           1   0.00     0.00
Driver               driver_line                 535.54           1 535.54    99.99
UnitCell             check_tau                     0.00           1   0.00     0.00
PW_Basis_Sup         setuptransform                0.01           1   0.01     0.00
PW_Basis_Sup         distributeg                   0.00           1   0.00     0.00
mymath               heapsort                      0.00           3   0.00     0.00
PW_Basis_K           setuptransform                0.02           1   0.02     0.00
PW_Basis_K           distributeg                   0.00           1   0.00     0.00
PW_Basis             setup_struc_factor            0.00           1   0.00     0.00
ppcell_vnl           init                          0.01           1   0.01     0.00
ppcell_vl            init_vloc                     0.11           1   0.11     0.02
ppcell_vnl           init_vnl                      0.16           1   0.16     0.03
WF_atomic            init_at_1                     0.00           1   0.00     0.00
wavefunc             wfcinit                       0.00           1   0.00     0.00
Ions                 opt_ions                    535.01           1 535.01    99.90
ESolver_KS_PW        run                         515.29           1 515.29    96.21
H_Ewald_pw           compute_ewald                 0.05           1   0.05     0.01
Charge               set_rho_core                  0.00           1   0.00     0.00
Charge               atomic_rho                    0.17           1   0.17     0.03
PW_Basis_Sup         recip2real                    0.09          75   0.00     0.02
PW_Basis_Sup         gathers_scatterp              0.08          75   0.00     0.01
Potential            init_pot                      0.01           1   0.01     0.00
Potential            update_from_charge            0.15          12   0.01     0.03
Potential            cal_fixed_v                   0.00           1   0.00     0.00
PotLocal             cal_fixed_v                   0.00           1   0.00     0.00
Potential            cal_v_eff                     0.15          12   0.01     0.03
H_Hartree_pw         v_hartree                     0.01          12   0.00     0.00
PW_Basis_Sup         real2recip                    0.06         100   0.00     0.01
PW_Basis_Sup         gatherp_scatters              0.04         100   0.00     0.01
PotXC                cal_v_eff                     0.13          12   0.01     0.02
XC_Functional        v_xc                          0.13          12   0.01     0.02
Potential            interpolate_vrs               0.00          12   0.00     0.00
Charge_Mixing        init_mixing                   0.00           1   0.00     0.00
ESolver_KS_PW        hamilt2density              514.84          11  46.80    96.13
HSolverPW            solve                       514.84          11  46.80    96.13
Nonlocal             getvnl                        8.75        6512   0.00     1.63
pp_cell_vnl          getvnl                       10.18        7696   0.00     1.90
Structure_Factor     get_sk                        1.91       26640   0.00     0.36
DiagoDavid           diag_mock                   453.26        6512   0.07    84.63
DiagoDavid           first                       113.31        6512   0.02    21.16
DiagoDavid           SchmitOrth                   45.71      463494   0.00     8.53
Operator             hPsi                        279.52       41223   0.01    52.19
Operator             EkineticPW                    2.38       41223   0.00     0.44
Operator             VeffPW                      240.84       41223   0.01    44.97
PW_Basis_K           recip2real                  130.86      600246   0.00    24.43
PW_Basis_K           gathers_scatterp             88.20      600246   0.00    16.47
PW_Basis_K           real2recip                  128.19      463494   0.00    23.93
PW_Basis_K           gatherp_scatters             97.99      463494   0.00    18.30
Operator             NonlocalPW                   36.12       41223   0.00     6.74
Nonlocal             add_nonlocal_pp               9.63       41223   0.00     1.80
DiagoDavid           cal_elem                     50.52       41223   0.00     9.43
DiagoDavid           diag_zhegvx                  42.21       41223   0.00     7.88
DiagoDavid           cal_grad                    247.13       34711   0.01    46.14
DiagoDavid           check_update                  0.03       34711   0.00     0.01
DiagoDavid           last                          5.98        8252   0.00     1.12
DiagoDavid           refresh                       2.63        1740   0.00     0.49
ElecStatePW          psiToRho                     30.96          11   2.81     5.78
Charge_Mixing        get_drho                      0.01          11   0.00     0.00
Charge_Mixing        inner_product_recip_rho       0.00          11   0.00     0.00
Charge               mix_rho                       0.02          10   0.00     0.00
Charge               Broyden_mixing                0.01          10   0.00     0.00
Charge_Mixing        inner_product_recip_hartree   0.01          88   0.00     0.00
Forces               cal_force_loc                 0.00           1   0.00     0.00
Forces               cal_force_ew                  0.00           1   0.00     0.00
Forces               cal_force_nl                  3.02           1   3.02     0.56
Forces               cal_force_cc                  0.00           1   0.00     0.00
Forces               cal_force_scc                 0.17           1   0.17     0.03
Stress_PW            cal_stress                   16.53           1  16.53     3.09
Stress_Func          stress_kin                    0.26           1   0.26     0.05
Stress_Func          stress_har                    0.00           1   0.00     0.00
Stress_Func          stress_ewa                    0.00           1   0.00     0.00
Stress_Func          stress_gga                    0.01           1   0.01     0.00
Stress_Func          stress_loc                    0.45           1   0.45     0.08
Stress_Func          stress_cc                     0.00           1   0.00     0.00
Stress_Func          stress_nl                    15.81           1  15.81     2.95
ModuleIO             write_istate_info             0.08           1   0.08     0.01
-------------------------------------------------------------------------------------

 START  Time  : Tue Apr 16 18:56:05 2024
 FINISH Time  : Tue Apr 16 19:05:00 2024
 TOTAL  Time  : 535
 SEE INFORMATION IN : OUT.ABACUS/

We can see an efficiency improvement of approximately 20%

@pxlxingliang
Copy link
Collaborator

pxlxingliang commented Apr 19, 2024

I try to set pw_diag_ndim =4 and do the test on some alloy systems. The energy/force/stress calculated by new dav have a large different with the results of old dav for some cases.
For most cases, the energy difference on the last SCF step is larger thant that in the old dav method.

https://app.bohrium.dp.tech/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.112-65e576

image

@Cstandardlib
Copy link
Collaborator

@pxlxingliang Is abacus test result now accessible to public? I try to look into the examples offered in https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.74-7cf445
but it seems I am refused to check the results.
Messages are shown as follows:
image
I am now working on dav(#4874 ) and want to test the performance, so I may need some reference. Thanks!

@pxlxingliang
Copy link
Collaborator

pxlxingliang commented Aug 4, 2024 via email

@Cstandardlib
Copy link
Collaborator

@pxlxingliang Thank you! Now I have access to these examples via the new link.

@haozhihan haozhihan added the Diago Issues related to diagonalizaiton methods label Sep 29, 2024
@mohanchen mohanchen removed the Performance Issues related to fail running ABACUS label Oct 24, 2024
@Cstandardlib
Copy link
Collaborator

Cstandardlib commented Oct 29, 2024

After investigating math of Davidson, we find duplicate function of overlap matrix scc and orthogonalization.
#4874 removes useless scc of origin Davidson method.
#3903 adds a new Davidson iteration method called subspace davidson for pw basis, and #5199 updates a new version of dav_subspace with higher performance.

Davidson is the traditional method with orthogonalization, while Dav_Subspace is the one without orthogonalization and uses a generalized projected eigenproblem instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Diago Issues related to diagonalizaiton methods
Projects
None yet
Development

No branches or pull requests

6 participants