Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sarek bcftools normalization #1682
base: dev
Are you sure you want to change the base?
Sarek bcftools normalization #1682
Changes from 15 commits
4772da1
451aaec
d97726b
e034ff0
8469832
e885888
9e94a05
2bdba7e
1214f10
b7ba4f2
34bf47b
6dff9af
d289261
c78af62
d646ec3
8fb64b2
92094af
fbbfe1b
24791dc
50f1b4b
fb4bb1e
a80cf11
b0f6c12
3bcc27b
f3c6ac6
f9c815d
f60d60d
c0a6ffc
188cf86
1fe12e3
f9e5204
7c96c98
b5909f2
ea7d25a
391f1ea
0bdb5d4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be condensed into a single configuration. Why are you publishing the normalised vcfs into two different subdirectories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Tabix below is not published in the same way, we would end up with the tbi in a different directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained it below, two different processes for either normalisation and concatenation of germline vcfs or normalisation of all vcfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these ones can be combined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the module, I think you can actually output the tbi in the same process as the vcf, so no need to spin up an extra process for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the bcftools_norm module and it needs as inputs vcf and tbi. So I guess, we can't exclude it. Or do you mean to output tbi from variant callers directly? But what could be excluded is tabix at the end after sorting, right? Because tbi is ouput from bcftools norm process and is transferred to bcftools sort, so it should end up with sorted vcf and tbi at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run
nf-core modules update bcftools/norm
that's an old version of the modulesSome generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing the same thing in the concatenation step. Can you check what happens if someone concatenates and the normalisaes? I have the feeling this will end a bunch of redundant information. On that note, the current order is: concat then normalise. Are we sure it shouldn't be the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original PR, there was a vcf_concatenate process that performed normalization first and then concatenation of germline vcfs. Since I wanted to normalize all vcfs without concatenation, I decided to keep the original process and add an additional one (vcf_normalization/main.nf) specifically for normalization. Also, Sarek already includes a process for concatenating germline vcfs, so I thought it should remain unchanged.
However, I am a bit confused because you mentioned that concatenation occurs before normalization. I initially thought that based on the boolean parameters (params.concatenate, params.normalized_vcfs), one could choose which process to run. If both processes run, I expected two different outputs: concatenated and normalized germline vcfs, and normalized vcfs (for all). It seems like I might have misunderstood the intended workflow.
Could you please explain where the order of processes comes from?