Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocean-ice tests crash with OOM when compiled with PGI #18

Closed
nikizadehgfdl opened this issue May 2, 2014 · 3 comments
Closed

ocean-ice tests crash with OOM when compiled with PGI #18

nikizadehgfdl opened this issue May 2, 2014 · 3 comments
Labels

Comments

@nikizadehgfdl
Copy link
Contributor

With -p ncrc2.pgi -t repro-openmp ocean-ice testcases crash around day 17 with the following message:

  [NID 00151] 2014-05-01 20:41:14 Apid 121706: initiated application termination
  [NID 00151] 2014-05-02 00:41:17 Apid 121706: OOM killer terminated this process.
  Application 121706 exit signals: Killed

print_memory_usage shows a uniform memory increase in each timestep:

       20140502 102929.609: Memuse(MB) at Main loop at coupling timestep=     1=  2.120E+02  2.236E+02  2.231E+00  2.187E+02
       20140502 103440.266: Memuse(MB) at Main loop at coupling timestep=   100=  1.007E+03  1.030E+03  6.008E+00  1.015E+03
       20140502 103944.863: Memuse(MB) at Main loop at coupling timestep=   200=  1.808E+03  1.835E+03  7.737E+00  1.818E+03
       20140502 104046.886: Memuse(MB) at Main loop at coupling timestep=   219=  1.964E+03  1.992E+03  6.961E+00  1.975E+03
      [NID 00028] 2014-05-02 10:40:50 Apid 121760: initiated application termination
      [NID 00028] 2014-05-02 14:40:53 Apid 121760: OOM killer terminated this process.

These are the tests that crashed:
/lustre/f1/Niki.Zadeh/tikal_201403_cjgUpdates_mom6_20140501_libs/MOM6_GOLD_SIS/ncrc2.pgi-repro-openmp/stdout/run/MOM6_GOLD_SIS_1x0m20d_32pe.o5032301
/lustre/f1/Niki.Zadeh/tikal_201403_cjgUpdates_mom6_20140501_libs/MOM6_GOLD_SIS_symmetric/ncrc2.pgi-repro-openmp/stdout/run/MOM6_GOLD_SIS_symmetric_1x0m20d_32pe.o5032302
/lustre/f1/Niki.Zadeh/tikal_201403_cjgUpdates_mom6_20140501_libs/MOM6_GOLD_SIS_icebergs/ncrc2.pgi-repro-openmp/stdout/run/MOM6_GOLD_SIS_icebergs_1x0m20d_32pe.o5032303
/lustre/f1/Niki.Zadeh/tikal_201403_cjgUpdates_mom6_20140501_libs/MOM6_SIS2/ncrc2.pgi-repro-openmp/stdout/run/MOM6_SIS2_1x0m20d_32pe.o5032311
/lustre/f1/Niki.Zadeh/tikal_201403_cjgUpdates_mom6_20140501_libs/MOM6_SIS2_cgrid/ncrc2.pgi-repro-openmp/stdout/run/MOM6_SIS2_cgrid_1x0m20d_32pe.o5032314

@nikizadehgfdl
Copy link
Contributor Author

This does not happen without -openmp where the Memuse saturates fast and the 20 days test runs to finish:

       20140502 105857.077: Memuse(MB) at Main loop at coupling timestep=     1=  1.999E+02  2.083E+02  1.716E+00  2.039E+02
       20140502 110418.030: Memuse(MB) at Main loop at coupling timestep=   100=  2.048E+02  2.236E+02  3.907E+00  2.104E+02
       20140502 110931.228: Memuse(MB) at Main loop at coupling timestep=   200=  2.049E+02  2.236E+02  4.048E+00  2.107E+02
       20140502 111135.274: Memuse(MB) at Main loop at coupling timestep=   240=  2.049E+02  2.236E+02  4.031E+00  2.107E+02

@Zhi-Liang
Copy link
Contributor

Hi Niki,

I will take a look. Could you pass me your xml file and experiment name?

Thanks,

Zhi

On Fri, May 2, 2014 at 11:11 AM, Niki Zadeh [email protected]:

This does not happen without -openmp where the Memuse saturates fast:

   20140502 105857.077: Memuse(MB) at Main loop at coupling timestep=     1=  1.999E+02  2.083E+02  1.716E+00  2.039E+02
   20140502 110418.030: Memuse(MB) at Main loop at coupling timestep=   100=  2.048E+02  2.236E+02  3.907E+00  2.104E+02
   20140502 110931.228: Memuse(MB) at Main loop at coupling timestep=   200=  2.049E+02  2.236E+02  4.048E+00  2.107E+02


Reply to this email directly or view it on GitHubhttps://github.com//issues/18#issuecomment-42042063
.

@nikizadehgfdl
Copy link
Contributor Author

This issue is fixed.

SMoorthi-emc pushed a commit to SMoorthi-emc/MOM6 that referenced this issue Apr 16, 2020
Hallberg-NOAA referenced this issue in marshallward/MOM6 Dec 6, 2021
Bug fix for reading First_direction from restart
gustavo-marques added a commit to gustavo-marques/MOM6 that referenced this issue Dec 12, 2021
…6nov2021

(*) Merge dev/gfdl as of 06 Nov 2021
gustavo-marques pushed a commit to gustavo-marques/MOM6 that referenced this issue May 6, 2022
…leanup (mom-ocean#19)

* Test Stanley EOS param in mixed_layer_restrat

* Fix size of TS cov, S var in Stanley calculate_density calls

* Test move stanley scheme initialization

* Added missing openMP directives

* Revert Stanley tvar discretization (mom-ocean#18)

* Perform vertical filling in calculation of T variance

* Variable declaration syntax error, remove scaling from get_param

* Fix call to vert_fill_TS

* Code cleanup, whitespace cleanup

Co-authored-by: Jessica Kenigson <[email protected]>
gustavo-marques pushed a commit to gustavo-marques/MOM6 that referenced this issue May 17, 2022
* initial hooks for stochastic EOS modifications

* remove debug statements

* add documentation

* Change ampltiude from 0.39 to sqrt(.39)

* remove global_indexing logic from stoch_eos_init

* switch to using MOM_random and add restart capability

* update random sequence to update each each time-step

* remove tseed0 from MOM_random (leftover from debugging)

* Added necessary submodules and S^2, T^2 diagnostics to MOM_diagnostics

* Added diagnostics for outputting variables related to the stochastic parameterization.

* Diagnostics in MOM_PressureForce_FV updated for stochastic (rather than deterministic) Stanley SGS T variance parameterization.

* Added parentheses for reproducibility.

* Changed diagnostics to account for possible absence of stoch_eos_pattern in MOM_PressureForce_FV,
when deterministic parameterization is on.

* remove mom6_da_hooks and geoKdTree from pkg

* add stochastic compoment to MOM_thickness_diffuse

* fix array size declaration and post_data

* Corrected indexing of loops in MOM_calc_varT

* Changed how parameterization of SGS T variance (deterministic and stochastic) is switched on in PGF and thickness diffusion codes

* Corrected a few typos

* Cleaned up indices, redundant diagnostic, printing

* Fixed diagnostic IDs

* Fixed diagnostics typo

* Corrected indices in calculation of tv%varT

* Minor index fix

* Corrected bug in pressure in Stanley diagnostics

* Fixed whitespace error

* Stoch eos clock (#5)

*Added a clock for the Stanley parameterization

Co-authored-by: jkenigson <[email protected]>

* add halo update to random pattern

* Update MOM_stoch_eos.F90

Fix bug for looping over compute domain (is -> isc etc.)

* Avoid unnessary computations on halo (MOM_stoch_eos) and code clean-up (MOM_thickness_diffuse)

* Removed halo updates before determ param calc

* Update MOM_stoch_eos.F90

Removed unnecessary code

* Bug - indices are transposed

* Changed Stanley stochastic coefficient from exp(X) to exp(aX) (#9)

* Changed Stanley stochastic coefficient from exp(X) to exp(aX)

* Extra spaces removed

* Stoch eos init fix (mom-ocean#10)

* Don't bother calculating tv%varT if stanley_coeff<0

* Missing then added

* Merge Ian Grooms Tvar Discretization (mom-ocean#11)

* Update MOM_stoch_eos.F90

In progress updating stencil for$ | dx \times \nabla T|^2$ calculation

* New discretization of |dx\circ\nablaT|^2

Co-authored-by: Ian Grooms <[email protected]>

* Multiplied tvar%SGS by grid cell thickness ratio

* Added limiter for tv%varT

* Stoch eos ncar linear disc (mom-ocean#12)

* Update MOM_stoch_eos.F90

In progress updating stencil for$ | dx \times \nabla T|^2$ calculation

* New discretization of |dx\circ\nablaT|^2

* AR1 timescale land mask

Adds land mask to the computation of the AR1 decorrelation time

* Update dt in call to MOM_stoch_eos_run

The call to `MOM_stoch_eos_run` (which time steps the noise) is from within `step_MOM_dynamics`. `step_MOM_dynamics` advances on time step `dt` (per line 957), but the noise is updated using `dt_thermo`. It seems more appropriate to update the noise using `dt`, since it gets called from within `step_MOM_dynamics`.

* Fixed the units for r_sm_H

* Remove vestigial declarations

The variables `hl`, `Tl`, `mn_T`, `mn_T2`, and `r_sm_H` are no longer used, so I removed their declarations and an OMP private clause

Co-authored-by: Ian Grooms <[email protected]>

* Update MOM_thickness_diffuse.F90

Changed index for soft convention

* Update CVMix-src

* Ensure use_varT, etc., initialized

* Don't register stanley diagnostics if scheme is off

* Stanley density second derivs at h pts (mom-ocean#15)

* Change discretization of Stanley correction (drho_dT_dT at h points)

* Limit Stanley noise, shrink limiting value

* Revert t variance discretization

* Reverted variable declarations

* Stanley scheme in mixed_layer_restrat, vert_fill in stoch_eos, code cleanup (mom-ocean#19)

* Test Stanley EOS param in mixed_layer_restrat

* Fix size of TS cov, S var in Stanley calculate_density calls

* Test move stanley scheme initialization

* Added missing openMP directives

* Revert Stanley tvar discretization (mom-ocean#18)

* Perform vertical filling in calculation of T variance

* Variable declaration syntax error, remove scaling from get_param

* Fix call to vert_fill_TS

* Code cleanup, whitespace cleanup

Co-authored-by: Jessica Kenigson <[email protected]>

* Use Stanley (2020) variance; scheme off at coast

* Comment clean-up

* Remove factor of 0.5 in Tvar

* Don't calculate Stanley diagnostics on halo

* Change start indices in stanley_density_1d

* Stanley param in MOM_isopycnal_slopes (mom-ocean#22)

Stanley param in MOM_isopycnal_slopes and thickness diffuse index fix

* Set eady flag to true if use_stored_slopes is true

* Cleanup, docs, whitespace

* Docs and whitespace

* Docs and whitespace

* Docs and whitespace

* Whitespace cleanup

* Whitespace cleanup

* Clean up whitespace

* Docs cleanup

* use_stanley

* Update MOM_lateral_mixing_coeffs.F90

* Adds link to another TEOS10 module

* Set Stanley off for testing

* Line continuation

Co-authored-by: Phil Pegion <[email protected]>
Co-authored-by: Philip Pegion <[email protected]>
Co-authored-by: Jessica Kenigson <[email protected]>
Co-authored-by: Jessica Kenigson <[email protected]>
Co-authored-by: jkenigson <[email protected]>
Co-authored-by: jskenigson <[email protected]>
Co-authored-by: Jessica Kenigson <[email protected]>
Co-authored-by: Jessica Kenigson <[email protected]>
Co-authored-by: Philip Pegion <[email protected]>
Co-authored-by: Jessica Kenigson <[email protected]>
andrew-c-ross pushed a commit to andrew-c-ross/MOM6 that referenced this issue Sep 18, 2024
Remove eqn_of_state from generic tracer interfaces because the
photoacclimation mixed layer depth is now calculated in MOM6 and
passed to MOM6.

Co-authored-by: Theresa Morrison <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants