Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface to kmer size for pseudoaligners #1144

Merged
merged 10 commits into from
Jan 3, 2024
Merged

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Dec 22, 2023

The default kmer length in both Salmon and Kallisto is too long for short reads (<50bp) found in older libraries and specialist RNA-seq variants.

We can allow users to set the kmer length via custom config, but I think it's a central enough parameter to expose at the workflow level.

There's also a fix here, since --gencode isn't actually a parameer for Kallisto indexing (copy/paste errror).

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

This PR is against the master branch ❌

  • Do not close this PR
  • Click Edit and change the base to dev
  • This CI test will remain failed until you push a new commit

Hi @pinin4fjords,

It looks like this pull-request is has been made against the nf-core/rnaseq master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the nf-core/rnaseq dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

@pinin4fjords pinin4fjords changed the base branch from master to dev December 22, 2023 14:13
Copy link

github-actions bot commented Dec 22, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 26a8b38

+| ✅ 146 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in WorkflowRnaseq.groovy: Optionally add in-text citation tools to this list.

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
  • multiqc_config - multiqc_config

✅ Tests passed:

Run details

  • nf-core/tools version 2.11.1
  • Run at 2024-01-03 16:25:57

Copy link
Member

@MatthiasZepper MatthiasZepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I have one remark that may be beyond the scope of this PR:

For Kallisto 0.48, the help text states that the maximum allowed kmer size is 31. Although I suppose people will rather lower than increase it, there is currently no validation / warning in place.

For some other parameters, we have input validation functions in lib/WorkflowRnaseq.groovy and I was wondering if there should be a check included for the new parameter ? Something akin to this, maybe?

if (params.pseudo_aligner_kmer_size && !(params.salmon_index || params.kallisto_index) && params.pseudo_aligner_kmer_size % 2 != 0) {
        println "It is strongly advised to choose odd kmer sizes."
}
if (params.pseudo_aligner == "kallisto" && !params.kallisto_index && params.pseudo_aligner_kmer_size && params.pseudo_aligner_kmer_size > 31) {
        println "Please choose an odd kmer size smaller or equal to 31"
} 

CHANGELOG.md Outdated Show resolved Hide resolved
@drpatelh
Copy link
Member

drpatelh commented Jan 3, 2024

For Kallisto 0.48, the help text states that the maximum allowed kmer size is 31. Although I suppose people will rather lower than increase it, there is currently no validation / warning in place.

Agree this would be a good addition. Do the tools themselves generate an error if these criteria aren't met?

@drpatelh drpatelh added this to the 3.13.3 milestone Jan 3, 2024
@pinin4fjords
Copy link
Member Author

For Kallisto 0.48, the help text states that the maximum allowed kmer size is 31. Although I suppose people will rather lower than increase it, there is currently no validation / warning in place.

I'm not actually in favour of this sort of thing, I think there's potential for significant maintenance burden keeping this sort of thing synchronised with tool updates. If larger kmer sizes are a problem, we should rely on the tools to fail with that problem highlighted.

I know that for Salmon at least it will error gracefully if you give it an even kmer size (for example)

@MatthiasZepper
Copy link
Member

Agree this would be a good addition. Do the tools themselves generate an error if these criteria aren't met?

I haven't run the tools, just checked that the argument flags match those in the tools' help texts when reviewing this PR.

I'm not actually in favour of this sort of thing, I think there's potential for significant maintenance burden keeping this sort of thing synchronised with tool updates. If larger kmer sizes are a problem, we should rely on the tools to fail with that problem highlighted.

I know that for Salmon at least it will error gracefully if you give it an even kmer size (for example)

That is a valid point. I am convinced (and happy that Salmon is that smart).

@pinin4fjords
Copy link
Member Author

Thanks for reviews!

@pinin4fjords pinin4fjords merged commit 9ea05fc into dev Jan 3, 2024
29 checks passed
@pinin4fjords pinin4fjords deleted the interface_to_kmer branch January 3, 2024 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants