Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java (multiple tools) -Xmx parameter #204

Merged
merged 64 commits into from
Nov 12, 2020

Conversation

tdayris
Copy link
Contributor

@tdayris tdayris commented Sep 25, 2020

Hi again,

As promised, all the memory additions in the previously discussed wrappers (#189 )

  • Please keep an eye to the mb → gb conversion for trinity: I did a dummy division by 1024 and rounded it down with the int() function. If you know a better way to do it, I'll make the changes.
  • Like the last time, there is the modification pushed on generate_docs.py still considered as 'new' despite being already accepted.

Many thanks in advance for your reviews

Edit: merge conflict solved

Copy link
Contributor

@dlaehnemann dlaehnemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdayris , this is really cool. Thanks for taking the time to update all these recipes.

Your strategy to convert from Mb to Gb seems good, the -Xmx parameter seems to work with powers of 1024.

One general thing I would suggest, is using the resources field in the snakemake rules, so that snakemake can directly use the arguments for restricting global memory usage (by setting --resources mem_mb=4096 should you only have 8G of RAM on your machine, for example) and requesting the amount of memory from a cluster system (as part of a qsub command that uses resources.mem_mb, for example). So instead of checking snakemake.params.get("extra", ""), you would check for snakemake.resources.get("mem_mb", "").
Also, to showcase that users can set this parameter, I would always add it to the respective test/Snakefile, because that is what's rendered as the example in the docs and users will usually copy-paste this. You could simply set this to something like:

    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024

Or if unsure about a good default value, you could comment the two resource specification lines out.

Another question---although I am not necessarily set on this---would be if we want to generally use mem_gb instead of mem_mb wherever the bioconda recipes allow it. For any of the tools that I have used, I usually deal in Gbs of memory, rather than Mbs. But if you disagree, feel free to keep it as is, with mem_mb. Especially, as the snakemake docs example for --resources deals in mem_mb.

And finally, with a quick scan across the wrappers, I identified some extra ones that would benefit from your update -- I checked that their bioconda scripts would respect a setting of -Xmx:

  • jannovar
  • snpeff
  • trimmomatic
  • varscan

However, this might still be an incomplete list. But the more we get done at once, the better. Also, should you need help with all the stuff I request, feel free to ping me in!

bio/trinity/wrapper.py Outdated Show resolved Hide resolved
@tdayris
Copy link
Contributor Author

tdayris commented Oct 19, 2020

Hi,

Many thanks for your review !

  • The wrappers you pointed out were also edited. Now trimmomatic, varscan, snpeff and jannovar have the automatic -Xmx parameter filled by the wrappers.
  • The information about memory is, indeed, taken from snakemake.resources, the check in snakemake.params.get("extra", "") is here to avoid to give -Xmx parameter twice if the user set one value as extra parameters. If you think I shall not handle this case, please tell me.

Edit: I forgot to cite #127 in this PR

@tdayris
Copy link
Contributor Author

tdayris commented Nov 2, 2020

Thanks a lot! What a quick answer! I'll commit the changes before the end of the week.

@tdayris
Copy link
Contributor Author

tdayris commented Nov 9, 2020

Hi, changes have been made, but I can't make through the tests since there is a very small bug in the snakemake-wrapper-utils

@dlaehnemann
Copy link
Contributor

Your PR is merged and a new release of the snakemake-wrapper-utils is up on the bioconda channel!

@tdayris
Copy link
Contributor Author

tdayris commented Nov 12, 2020

Hi, once again, thank you very much for your help with this work.

Changes since last time:

  • snakemake-wrapper-utils have replaced the copy-pasted code
  • snakemake-wrapper-utils included in the environment.yaml-files

Feel free to point out any mistake I have made

Copy link
Contributor

@dlaehnemann dlaehnemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Many thanks for the systematic update of Java memory management options and all the additions of docs links!

The only thing that could still be amended, would be adding resources: mem_mb = 1024 entries to the gatk3 wrappers:

  • realignertargetcreator
  • printreads
  • baserecalibrator
  • indelrealigner

But by now, these are kind of legacy wrappers (with gatk4 long released as the default). So it's maybe not necessary to update them. So just let me know if you want to add this (if so, note that I did one little commit adding a comment in the jannovar Snakefile, so pull you branch before changing it locally) or if I should just merge as is.

@tdayris
Copy link
Contributor Author

tdayris commented Nov 12, 2020

Thanks you, the resources information were added in GATK3. You are right.

@dlaehnemann
Copy link
Contributor

@tdayris Did you push this already? Because I can't see it here in the PR, yet.

@tdayris
Copy link
Contributor Author

tdayris commented Nov 12, 2020

Wrong branch, my fault ! here are the modifications

@dlaehnemann
Copy link
Contributor

Cheers! And thanks again for this major effort! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants