Skip to content

Commit

Permalink
[NASA:Update] Update GT4Py and DaCe + submoduling of DaCe (#21)
Browse files Browse the repository at this point in the history
* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* lint

* More linting

* Remove unused if leading to empty code block

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0
Fix bad merge for bdt with GEOS_Wrapper

* Remove unused code

* Fix theroritical timings
Lint

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

This reverts commit 4fc5b4d.

* Revert "Remove previous per stencil override of default_build_folder"

This reverts commit 2245027.

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Fix bad requirements syntax

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Replace all logger with pace_log
Introduce PACE_LOGLEVEL to control log level from outside

* Code guidelines clean up

* Devops/GitHub actions on (#15)

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Remove cd to pace

* Lint: git submodule recursive

* Typo

* Add openmpi to the image

* Linting

* Fix unit tests (remove dxa, dya rely on halo ex)

* typo

* Change name of jobs

* Distributed compilation on orchestrated backend for NxN layouts (#14)

* Adapt orchestration distribute compile for NxN layout

* Remove debug code

* Add a more descriptive string base postfix for cache naming
Identify the code path for all cases
Consistent reload post-compile
Create a central space for all caches generation logic
No more original layout check required

* Add a test on caches relocatability

* Verbose todo

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Remove cd to pace

* Lint: git submodule recursive

* Typo

* Add openmpi to the image

* Linting

* Fix unit tests (remove dxa, dya rely on halo ex)

* typo

* Change name of jobs

* Missing enum

* Lint imports

* Fix unit tests

* Deactivate relocability test due to Python crash
Logged as issyue 16

* Typo

* Raise for 1,X and X,1 layouts which requires a new descriptor

* Added ak, bk for 137 levels in eta.py

* Add floating point precision to GEOS bridge init

* lint

* Add device PCI bus id (for MPS debug)

* Typo + lint

* Try to detect MPS reading the "log" pipe

* Lint

* Clean up

* Log info GEOS bridge (#18)

* Add floating point precision to GEOS bridge init

* lint

* Add device PCI bus id (for MPS debug)

* Typo + lint

* Try to detect MPS reading the "log" pipe

* Lint

* Clean up

* Update geos/develop to grab NOAA PR9 results (#21)

* Verbose choice of block/grid size

* added build script for c5

* updated repo to NOAA

* GEOS integration (#9)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* Remove unused if leading to empty code block

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0
Fix bad merge for bdt with GEOS_Wrapper

* Remove unused code

* Fix theroritical timings

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

* Revert "Remove previous per stencil override of default_build_folder"

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Fix bad requirements syntax

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Fix or explain inlined import

* Verbose runtime error when bad dt_atmos

* Verbose warm up

* re-initialize heat_source and diss_est each call, add do_skeb check to accumulation

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>

---------

Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>

* [NOAA:Update] Bring back #15 & doubly periodic domain (#25)

* Feature/dp driver (#13)

* initial commit

* adding test config

* adding the rest of driver and util code

* updating history.md

* move u_max to dycore config

* uncomment assert

* added comment explaining the copy of grid type to dycore config

* Turn main unit test  & lint on PR, logger clean up [NASA:Update]  (#15)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0

* Fix theroritical timings

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Introduce PACE_LOGLEVEL to control log level from outside

* Code guidelines clean up

* Devops/GitHub actions on (#15)

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Fix unit tests (remove dxa, dya rely on halo ex)

* Update HISTORY.md

* Adapt log_level in driver.run

* Verbose the PACE_CONSTANTS

* Doc log level hierarchical nature

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>

* Lint

---------

Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>

* Update gt4py, dace, cleanup (#19)

* Update gt4py to top of master on June 21

* Update DaCe to 0.14.2
Workaround aliasing issue in FiniteVolumeTransport

* Fix to gt4py storage

* Downgrade to dace 0.14.1

* DaCe to 0.14.4
Orchestrating NonHydrostaticPressureGradient
Adptating code to newer gt4py

* Regenerate constraints.txt

* Default constants to GFS
Fix snapshot for GPU runs
Lint on ETA
Fix log level

* Remove `daint_venv` submodule

* Adding dace as a submodule
Removing buildenv as a submodule

* Update gt4py to latest master

* Skip ConstantPropagation during `Simplify`

* Remove buidlenv

* Update requirements_dev.txt

* Add editable util to requirements_dev.txt

* lint

* scipy for tests is now needed

* Pin `DaCe` to pace-fixes-0 merge

* Remove logging setup in test_translate

* Make cupy import robust to device not being available

* Fix to GEOS bridge MPS detection

* Up gt4py to August 14th EOD:
  - Hip/ROCm
  - New allocators

* DaCE module: swap SSH for HTTPS (#26)

* GEOS GridTools stencils build override (#27)

* Stencil build override for GEOS

* Deactivate warnings if PACE_LOGLEVEL is > WARNING

* Better log level

* Bad merge (again)

* Update fv3core/pace/fv3core/initialization/geos_wrapper.py

Co-authored-by: Oliver Elbert <[email protected]>

* FVTP2D: somewhat better workaround

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
  • Loading branch information
6 people authored Sep 15, 2023
1 parent b1ef6b5 commit 0cdba14
Show file tree
Hide file tree
Showing 20 changed files with 189 additions and 331 deletions.
6 changes: 3 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[submodule "external/gt4py"]
path = external/gt4py
url = https://github.com/gridtools/gt4py.git
[submodule "buildenv"]
path = buildenv
url = https://github.com/ai2cm/buildenv.git
[submodule "external/dace"]
path = external/dace
url = https://github.com/spcl/dace.git
1 change: 0 additions & 1 deletion buildenv
Submodule buildenv deleted from ab7966
107 changes: 39 additions & 68 deletions constraints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# This file is autogenerated by pip-compile with Python 3.8
# by the following command:
#
# pip-compile --output-file=constraints.txt driver/setup.py dsl/setup.py external/gt4py/setup.cfg fv3core/setup.py physics/setup.py requirements_dev.txt requirements_docs.txt requirements_lint.txt stencils/setup.py util/requirements.txt util/setup.py
# pip-compile --output-file=constraints.txt driver/setup.py dsl/setup.py fv3core/setup.py physics/setup.py requirements_dev.txt requirements_docs.txt requirements_lint.txt stencils/setup.py util/requirements.txt util/setup.py
#
aenum==3.1.11
# via dace
Expand All @@ -21,36 +21,31 @@ asttokens==2.0.5
# devtools
# stack-data
astunparse==1.6.3
# via dace
# via
# dace
# gt4py
async-timeout==3.0.1
# via aiohttp
attrs==22.1.0
# via
# aiohttp
# gt4py
# gt4py (external/gt4py/setup.cfg)
# jsonschema
# pytest
babel==2.9.1
# via sphinx
backcall==0.2.0
# via ipython
backports.entry-points-selectable==1.1.1
backports-entry-points-selectable==1.1.1
# via virtualenv
black==22.3.0
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
boltons==21.0.0
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
bump2version==1.0.1
# via -r util/requirements.txt
cached-property==1.5.2
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
cachetools==4.2.2
# via google-auth
certifi==2021.5.30
Expand All @@ -74,20 +69,19 @@ click==8.0.1
# black
# flask
# gt4py
# gt4py (external/gt4py/setup.cfg)
cloudpickle==2.0.0
# via dask
cmake==3.26.4
# via gt4py
commonmark==0.9.1
# via recommonmark
coverage==5.5
# via
# -r util/requirements.txt
# pytest-cov
cytoolz==0.11.2
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
dace==0.14.0
cytoolz==0.12.1
# via gt4py
dace==0.14.4
# via
# -r requirements_dev.txt
# pace-dsl
Expand All @@ -109,13 +103,9 @@ decorator==5.0.9
# gcsfs
# ipython
deepdiff==6.2.1
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
devtools==0.8.0
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
dill==0.3.5.1
# via dace
distlib==0.3.2
Expand Down Expand Up @@ -155,9 +145,7 @@ flake8==3.8.4
flask==2.1.2
# via dace
frozendict==2.3.4
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
fsspec==2021.7.0
# via
# dask
Expand Down Expand Up @@ -196,10 +184,8 @@ googleapis-common-protos==1.53.0
# via google-api-core
gprof2dot==2021.2.21
# via pytest-profiling
gridtools-cpp==2.2.2
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
gridtools-cpp==2.3.0
# via gt4py
h5netcdf==0.11.0
# via -r util/requirements.txt
h5py==2.10.0
Expand All @@ -217,7 +203,9 @@ imagesize==1.2.0
importlib-metadata==4.11.3
# via flask
importlib-resources==5.10.0
# via jsonschema
# via
# gt4py
# jsonschema
iniconfig==1.1.1
# via pytest
ipykernel==6.16.2
Expand All @@ -232,7 +220,6 @@ jinja2==3.0.1
# via
# flask
# gt4py
# gt4py (external/gt4py/setup.cfg)
# sphinx
jsonschema==4.16.0
# via nbformat
Expand All @@ -244,12 +231,12 @@ jupyter-core==4.11.2
# via
# jupyter-client
# nbformat
lark==1.1.5
# via gt4py
locket==0.2.1
# via partd
mako==1.1.6
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
markupsafe==2.0.1
# via
# jinja2
Expand All @@ -260,6 +247,11 @@ matplotlib-inline==0.1.6
# ipython
mccabe==0.6.1
# via flake8
mpi4py==3.1.4
# via
# -r requirements_dev.txt
# pace-driver
# pace-driver (driver/setup.py)
mpmath==1.2.1
# via sympy
multidict==5.1.0
Expand All @@ -272,7 +264,6 @@ mypy-extensions==0.4.3
# via
# black
# mypy
# typing-inspect
nbclient==0.6.8
# via nbmake
nbformat==5.7.0
Expand All @@ -292,10 +283,9 @@ netcdf4==1.5.7
# pace-driver
# pace-driver (driver/setup.py)
networkx==2.6.3
# via
# dace
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via dace
ninja==1.11.1
# via gt4py
nodeenv==1.6.0
# via pre-commit
numcodecs==0.7.2
Expand All @@ -309,7 +299,6 @@ numpy==1.21.2
# cftime
# dace
# gt4py
# gt4py (external/gt4py/setup.cfg)
# h5py
# netcdf4
# numcodecs
Expand All @@ -322,7 +311,6 @@ numpy==1.21.2
# pace-util
# pace-util (util/setup.py)
# pandas
# scipy
# xarray
# zarr
oauthlib==3.1.1
Expand All @@ -333,7 +321,6 @@ packaging==21.0
# via
# dask
# gt4py
# gt4py (external/gt4py/setup.cfg)
# ipykernel
# pytest
# sphinx
Expand Down Expand Up @@ -387,9 +374,7 @@ pyasn1==0.4.8
pyasn1-modules==0.2.8
# via google-auth
pybind11==2.8.1
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
pycodestyle==2.6.0
# via flake8
pycparser==2.20
Expand Down Expand Up @@ -465,10 +450,6 @@ requests-oauthlib==1.3.0
# via google-auth-oauthlib
rsa==4.7.2
# via google-auth
scipy==1.7.1
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
six==1.16.0
# via
# asttokens
Expand Down Expand Up @@ -513,10 +494,8 @@ stack-data==0.5.1
# via ipython
sympy==1.9
# via dace
tabulate==0.8.9
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
tabulate==0.9.0
# via gt4py
toml==0.10.2
# via
# pre-commit
Expand All @@ -530,8 +509,6 @@ toolz==0.11.1
# -r util/requirements.txt
# cytoolz
# dask
# gt4py
# gt4py (external/gt4py/setup.cfg)
# partd
tornado==6.2
# via
Expand All @@ -555,15 +532,9 @@ typing-extensions==4.3.0
# aiohttp
# black
# gt4py
# gt4py (external/gt4py/setup.cfg)
# mypy
# pace-util
# pace-util (util/setup.py)
# typing-inspect
typing-inspect==0.7.1
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
urllib3==1.26.6
# via requests
virtualenv==20.7.2
Expand All @@ -590,9 +561,7 @@ xarray==0.19.0
# pace-physics
# pace-physics (physics/setup.py)
xxhash==2.0.2
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
# via gt4py
yarl==1.6.3
# via aiohttp
zarr==2.9.2
Expand All @@ -601,7 +570,9 @@ zarr==2.9.2
# pace-driver
# pace-driver (driver/setup.py)
zipp==3.8.0
# via importlib-metadata
# via
# importlib-metadata
# importlib-resources

# The following packages are considered to be unsafe in a requirements file:
# setuptools
2 changes: 1 addition & 1 deletion dsl/pace/dsl/dace/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def set_distributed_caches(config: "DaceConfig"):
verb = "reading"

gt_config.cache_settings["dir_name"] = get_cache_directory(config.code_path)
pace.util.pace_log.critical(
pace.util.pace_log.info(
f"[{orchestration_mode}] Rank {config.my_rank} "
f"{verb} cache {gt_config.cache_settings['dir_name']}"
)
22 changes: 15 additions & 7 deletions dsl/pace/dsl/dace/orchestration.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from dace.frontend.python.parser import DaceProgram
from dace.transformation.auto.auto_optimize import make_transients_persistent
from dace.transformation.helpers import get_parent_map
from dace.transformation.passes.simplify import SimplifyPass

from pace.dsl.dace.build import get_sdfg_path, write_build_info
from pace.dsl.dace.dace_config import (
Expand Down Expand Up @@ -65,17 +66,13 @@ def _download_results_from_dace(
gt4py_results = [
gt4py.storage.from_array(
r,
default_origin=(0, 0, 0),
backend=config.get_backend(),
managed_memory=True,
)
for r in dace_result
]
else:
gt4py_results = [
gt4py.storage.from_array(
r, default_origin=(0, 0, 0), backend=config.get_backend()
)
gt4py.storage.from_array(r, backend=config.get_backend())
for r in dace_result
]
return gt4py_results
Expand Down Expand Up @@ -111,6 +108,17 @@ def _to_gpu(sdfg: dace.SDFG):
sd.openmp_sections = False


def _simplify(sdfg: dace.SDFG, validate=True, verbose=False):
"""Override of sdfg.simplify to skip failing transformation
per https://github.com/spcl/dace/issues/1328
"""
return SimplifyPass(
validate=validate,
verbose=verbose,
skip=["ConstantPropagation"],
).apply_pass(sdfg, {})


def _build_sdfg(
daceprog: DaceProgram, sdfg: dace.SDFG, config: DaceConfig, args, kwargs
):
Expand Down Expand Up @@ -144,7 +152,7 @@ def _build_sdfg(
del sdfg_kwargs[k]

with DaCeProgress(config, "Simplify (1/2)"):
sdfg.simplify(validate=False, verbose=True)
_simplify(sdfg, validate=False, verbose=True)

# Perform pre-expansion fine tuning
with DaCeProgress(config, "Split regions"):
Expand All @@ -155,7 +163,7 @@ def _build_sdfg(
sdfg.expand_library_nodes()

with DaCeProgress(config, "Simplify (2/2)"):
sdfg.simplify(validate=False, verbose=True)
_simplify(sdfg, validate=False, verbose=True)

# Move all memory that can be into a pool to lower memory pressure.
# Change Persistent memory (sub-SDFG) into Scope and flag it.
Expand Down
2 changes: 1 addition & 1 deletion dsl/pace/dsl/dace/sdfg_opt_passes.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ def splittable_region_expansion(sdfg: dace.SDFG, verbose: bool = False):
"K",
]
if verbose:
pace_log.info(f"Reordered schedule for {node.label}")
pace_log.debug(f"Reordered schedule for {node.label}")
1 change: 1 addition & 0 deletions external/dace
Submodule dace added at 892d61
Loading

0 comments on commit 0cdba14

Please sign in to comment.