forked from mom-ocean/MOM6
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gpu test #3
Open
nikizadehgfdl
wants to merge
5
commits into
dev/gfdl
Choose a base branch
from
gpu_test
base: dev/gfdl
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Gpu test #3
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Per routine tackle may not be such a good idea (like OMP case) nevertheless here's an attempt to do just that with ACC to see how timings and answers change. - Here are the results of this update on the gpubox lscgpu50-d (Tesla V100). Although the timings look promissing, the answers might change by a lot. dsum1 and dsum2 are just some physical quantities (answers) that indicate how much the answers might change. It doesn't look to be just round-off! timings show how much faster this particular calculation might get on GPUS. The speed up kicks in at the second iteration because some of the data is already moved to device (via present_or_copy directives). DEV dsum1 dsum2 timins(sec) cpu 16862846.89627802 362957721989943.6 0.1062018871307373 gpu 16862849.65544315 362957721989943.6 0.2714490890502930 cpu 88276966.79257554 362957721989943.6 0.1057448387145996 gpu 88276963.53479740 362957721989943.6 6.8866014480590820E-002 cpu -1223928.224251170 362957721989943.6 0.1059348583221436 gpu -1223928.228475637 362957721989943.6 6.8870782852172852E-002 - Note the speed up depends on how powerful the gpu device is. In this case it's an idle Tesla V100. In the cased of my busy worstation nvidia gpu, there is actaully a slow down.
- Per routine tackle may not be such a good idea (like OMP case) nevertheless here's an attempt to do just that with ACC to see how timings and answers change. - Here are the results of this update on the gpubox lscgpu50-d (Tesla V100). Although the timings look promissing, the answers might change by a lot. dsum1 and dsum2 are just some physical quantities (answers) that indicate how much the answers might change. It doesn't look to be just round-off! timings show how much faster this particular calculation might get on GPUS. The speed up kicks in at the second iteration because some of the data is already moved to device (via present_or_copy directives). DEV dsum1 dsum2 timins(sec) cpu 16862846.89627802 362957721989943.6 0.1062018871307373 gpu 16862849.65544315 362957721989943.6 0.2714490890502930 cpu 88276966.79257554 362957721989943.6 0.1057448387145996 gpu 88276963.53479740 362957721989943.6 6.8866014480590820E-002 cpu -1223928.224251170 362957721989943.6 0.1059348583221436 gpu -1223928.228475637 362957721989943.6 6.8870782852172852E-002 - Note the speed up depends on how powerful the gpu device is. In this case it's an idle Tesla V100. In the case of my busy worstation nvidia gpu, there is actaully a slow down.
- May be 1% speed-up for 1 MPI rank
- pgi profiler shows MOM_hor_visc.F90:horizontal_viscosity as one of the most sampled routines in OM4. Hence the choice to use openacc - At this update there is almost no gain (neither a loss) in timings for this module by using a single gpu in addition to a single cpu (mpirun -np 1). This shows that unless we can delegate more loops to gpu and/or get rid of the "!$ACC update self" directive there is no point in running with gpus!
nikizadehgfdl
pushed a commit
that referenced
this pull request
Apr 30, 2021
MOM_domain_infra: Document FMS passthroughs
nikizadehgfdl
pushed a commit
that referenced
this pull request
Jan 7, 2022
* reads in porous topography parameters from CHANNEL_LIST_FILE *new module to compute curve fit for porous topography *porous constraints used to modify continuity_PPM, CoriolisAdv, and Rayleigh bottom channel drag
nikizadehgfdl
pushed a commit
that referenced
this pull request
Jan 7, 2022
(+) porous topography implementation
nikizadehgfdl
pushed a commit
that referenced
this pull request
Feb 10, 2022
Use the por_face_area[UV] in the effective thickness calculations in zonal_face_thickness and merid_face_thickness, so that they are more consistent with their use elsewhere in the code for the relative weights in calculating the barotropic accelerations. Because these por_face_area arrays are still 1 in all test cases, the answers are unchanged in any test cases from before a few weeks ago, but there could be answer changes in cases that are using the very recently added capability (in PR #3) to set fractional face areas. This change was discussed with Sam Ditkovsky, and agreed that there is no reason to keep the ability to recover the previous answers in any cases that use the recently added partial face width option. This commit also expanded the comments describing the h_u and h_v arguments to btcalc(), zonal_face_thickness(), and merid_face_thickness() routines, the diag_h[uv] elements of the accel_diag_ptrs type and the h_u and h_v elements of the BT_cont_type. All answers and output are bitwise identical in the MOM6-examples test suite and TC tests, but answer changes are possible in cases using a very recently added code option.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.