Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using ladjust_bury_coeff in MARBL requires specific properties from PE layout #74

Open
mnlevy1981 opened this issue Sep 1, 2022 · 0 comments

Comments

@mnlevy1981
Copy link
Collaborator

mnlevy1981 commented Sep 1, 2022

Description of the issue:

A user was trying to run with ladjust_bury_coeff in user_nl_marbl (which is not a very common configuration); he was also trying to get 100+ SYPD out of the gx3v7 grid (which is not a very common requirement), so he was running with 288 ocean tasks. gen_pop_decomp was giving a layout that creating 290 blocks, and reported the model crashing in ecosys_driver.F90:513 at

    508     allocate(rmean_vals(size(marbl_instances(1)%glo_avg_rmean_interior_tendency)))
    509     lscalar = .false.
    510     call ecosys_running_mean_saved_state_get_var_vals('interior_tendency', lscalar, rmean_vals(:))
    511     do n = 1, size(rmean_vals)
    512        do iblock = 1, size(marbl_instances)
    513           marbl_instances(iblock)%glo_avg_rmean_interior_tendency(n)%rmean = rmean_vals(n)
    514        end do
    515     end do
    516     deallocate(rmean_vals)

it turns out the issue is that marbl_instances is size max_blocks_clinic (2, in his configuration) and we only want these loops running through nblocks_clinic (1 on most tasks), so ladjust_bury_coeff currently can't be true if any block has nblocks_clinic < max_blocks_clinic. Fixing that moved the error to ecosys_driver:640:

    637     if ((size(glo_avg_fields_interior, dim=4) /= 0) .or. (size(glo_avg_fields_surface, dim=4) /= 0)) then
    638        allocate(glo_avg_area_masked(nx_block, ny_block, nblocks_clinic))
    639        where (land_mask(:,:,:))
    640           glo_avg_area_masked(:,:,:) = TAREA(:,:,:)
    641        else where
    642           glo_avg_area_masked(:,:,:) = c0
    643        end where

(I think the third dimension of land_mask and TAREA are both max_blocks_clinic while the allocate() statement for glo_avg_area_masked in line 638 shows it uses nblocks_clinic instead.)

As you can tell, I've started working on a fix for this... I think I changed the above block to explicitly use 1:nblocks_clinic for the third dimension of land_mask in 639 and TAREA in 640, but got yet another error elsewhere.

The original user who reported the problem was happy to be given a 252 task layout that keeps max_blocks_clinic=1, so fixing this is not urgent. I'm putting all this detail in the issue ticket because I'm going to set it aside for a few weeks while I focus on more pressing issues, but it would probably be good to eventually come back and fix the bug.

I also think it would be useful to update the test suite to try to explicitly test cases where ladjust_bury_coeff = .true. and either some tasks have more blocks than others, or some tasks have no blocks. I expect both of those tests would fail currently.

Version:

  • CESM: 2_3_beta09; I believe the first user was running CESM 2.1.x
  • POP2: cesm_pop_2_1_20220322

Machine/Environment Description:

error was reported on cheyenne and that's also where I reproduced the issue in the latest codebase

Any xml/namelist changes or SourceMods:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant