Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

significantly speed up parsing of easyconfig files by only extracting comments from an easyconfig file when they're actually needed #3498

Merged
merged 2 commits into from
Nov 10, 2020

Conversation

boegel
Copy link
Member

@boegel boegel commented Nov 8, 2020

While trying to figure out why there's a long delay before actually starting to install extensions, especially with R easyconfigs which include 100s of extensions, I noticed that 85% of the time is spent in the extract_comments method (for R-4.0.0-foss-2020a.eb):

         578666925 function calls (566851911 primitive calls) in 2925.380 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005 2927.606 2927.606 main.py:37(<module>)
        1    0.000    0.000 2927.245 2927.245 main.py:185(main)
        1    0.000    0.000 2918.773 2918.773 main.py:98(build_and_install_software)
        1    0.000    0.000 2918.772 2918.772 easyblock.py:3258(build_and_install_one)
        1    0.000    0.000 2918.709 2918.709 easyblock.py:3195(run_all_steps)
        4    0.000    0.000 2918.701  729.675 easyblock.py:3057(run_step)
        1    0.012    0.012 2876.028 2876.028 easyblock.py:2159(extensions_step)
      828    0.003    0.000 2867.088    3.463 rpackage.py:80(__init__)
      828    0.009    0.000 2867.085    3.463 extensioneasyblock.py:69(__init__)
      828    0.024    0.000 2867.066    3.463 extension.py:89(__init__)
      828    0.081    0.000 2865.669    3.461 easyconfig.py:559(copy)
  844/829    0.067    0.000 2857.436    3.447 easyconfig.py:410(__init__)
      844    0.007    0.000 2523.981    2.990 parser.py:83(__init__)
      844    3.057    0.004 2496.945    2.958 one.py:359(extract_comments)
  2145127 2481.543    0.001 2484.056    0.001 one.py:385(split_on_comment_hash)

Part of this is that for every extension in the easyconfig file the easyconfig file itself is re-parsed, to ensure the extension installation starts with a clean slate. It may be possible to avoid that, but ensuring this is done correctly (that is, while avoiding that stuff leaks from the "parent" installation (e.g. R) to extensions, or between extensions themselves) is not trivial.

An easy win is to only extract comments from easyconfig files when needed, i.e. when self.comments is used (which is basically only done when calling the dump() method to dump a parsed easyconfig file). This makes a huge difference, here's the profile for eb getting ready to install extensions for R-4.0.0-foss-2020a.eb (same as what is covered by the profile above):

         535490434 function calls (523944545 primitive calls) in 439.545 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.050    0.050  441.029  441.029 main.py:37(<module>)
        1    0.000    0.000  440.069  440.069 main.py:185(main)
        1    0.000    0.000  424.523  424.523 main.py:98(build_and_install_software)
        1    0.000    0.000  424.522  424.522 easyblock.py:3258(build_and_install_one)
        1    0.000    0.000  421.484  421.484 easyblock.py:3195(run_all_steps)
        4    0.000    0.000  421.477  105.369 easyblock.py:3057(run_step)
        1    0.012    0.012  378.407  378.407 easyblock.py:2159(extensions_step)
      828    0.003    0.000  369.687    0.446 rpackage.py:80(__init__)
      828    0.009    0.000  369.684    0.446 extensioneasyblock.py:69(__init__)
      828    0.023    0.000  369.667    0.446 extension.py:89(__init__)
      828    0.079    0.000  368.294    0.445 easyconfig.py:559(copy)
  844/829    0.059    0.000  366.487    0.442 easyconfig.py:410(__init__)
  844/829    1.155    0.001  331.898    0.400 easyconfig.py:647(parse)
  844/830    0.399    0.000  287.455    0.346 easyconfig.py:1546(_finalize_dependencies)
23224/23212    2.990    0.000  258.283    0.011 easyconfig.py:2158(robot_find_subtoolchain_for_dep)
  2046232    3.160    0.000  247.176    0.000 __init__.py:1249(_log)

Without --debug, it's significantly faster even:

         300860065 function calls (289316606 primitive calls) in 223.616 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.048    0.048  224.990  224.990 main.py:37(<module>)
        1    0.000    0.000  223.744  223.744 main.py:185(main)
        1    0.000    0.000  214.251  214.251 main.py:98(build_and_install_software)
        1    0.000    0.000  214.250  214.250 easyblock.py:3258(build_and_install_one)
        1    0.000    0.000  211.189  211.189 easyblock.py:3195(run_all_steps)
        4    0.000    0.000  211.183   52.796 easyblock.py:3057(run_step)
        1    0.010    0.010  172.864  172.864 easyblock.py:2159(extensions_step)
      828    0.003    0.000  165.117    0.199 rpackage.py:80(__init__)
      828    0.008    0.000  165.115    0.199 extensioneasyblock.py:69(__init__)
      828    0.021    0.000  165.099    0.199 extension.py:89(__init__)
      828    0.072    0.000  164.765    0.199 easyconfig.py:559(copy)
  844/829    0.056    0.000  157.458    0.190 easyconfig.py:410(__init__)
  844/829    1.178    0.001  138.856    0.167 easyconfig.py:647(parse)
  844/830    0.360    0.000  128.497    0.155 easyconfig.py:1546(_finalize_dependencies)
23224/23212    2.921    0.000  124.889    0.005 easyconfig.py:2158(robot_find_subtoolchain_for_dep)

@boegel boegel added this to the 4.3.2 milestone Nov 8, 2020
@boegel boegel changed the title significantly speed up parsing of easyconfig files by only extracting comments from an easyconfig file only when they're actually needed significantly speed up parsing of easyconfig files by only extracting comments from an easyconfig file when they're actually needed Nov 8, 2020
… comments from an easyconfig file when they're actually needed
@easybuilders easybuilders deleted a comment from boegelbot Nov 10, 2020
Copy link
Contributor

@akesandgren akesandgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akesandgren
Copy link
Contributor

Going in, thanks @boegel!

@akesandgren akesandgren merged commit 27b8362 into easybuilders:develop Nov 10, 2020
@boegel boegel deleted the extract_comments_on_demand branch November 10, 2020 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants