Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement: recipe_easy_ipcc.yml #2300

Open
bouweandela opened this issue Jan 15, 2024 · 0 comments
Open

Performance improvement: recipe_easy_ipcc.yml #2300

bouweandela opened this issue Jan 15, 2024 · 0 comments
Assignees

Comments

@bouweandela
Copy link
Member

bouweandela commented Jan 15, 2024

This issue keeps track of the performance improvements implemented for recipe_easy_ipcc.yml (documentation) as part of the ESiWACE3 service project. Example output is available here.

Settings

To run the recipe, the following settings are used:

~/.esmvaltool/config-user.yml

max_parallel_tasks: 1

~/.esmvaltool/dask.yml

cluster:
  type: dask_jobqueue.SLURMCluster
  queue: compute
  account: bd0854
  cores: 128
  memory: 256GiB
  processes: 32
  interface: ib0
  local_directory: /scratch/b/b381141/dask-tmp
  n_workers: 32
  walltime: '8:00:00'

Profiling

The baseline runtime is about 4 hours.

Conda environment used for profiling: environment.yml.

A smaller version of the recipe was used for profiling runs. It uses only the historical experiment with data between 1950 and 2000 (recipe file).

The profiles attached below were created with py-spy using the command

py-spy record \
--idle \
--rate 10 \
--subprocesses \
--format speedscope \
esmvaltool run examples/recipe_easy_ipcc.yml

and can be viewed with https://www.speedscope.app:

  1. Initial run: profile
  2. Faster coordinate comparisons for concatenate Faster and simpler iris.util.array_equal SciTools/iris#5610 and faster CMOR fixes Faster coordinate checks and longitude fix #2264: profile
  3. Faster cube printing and lazy cells Faster trivial equality checks for coordinates and arrays SciTools/iris#5691 and Make the Coord.cell method lazy SciTools/iris#5693: profile
  4. Do not realize cell measures and ancillary variables in concatenate SciTools/iris#6010, Parallel concatenate SciTools/iris#5926 and, Faster time coordinate categorization SciTools/iris#5999 first run profile second run profile
  5. main iris and ESMValCore branch on 2024-09-12 runs including Load esmvalcore.dataset.Dataset objects in parallel using Dask #2517 and Save all files in a task at the same time to avoid recomputing intermediate results #2522
    profile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant