-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Productionize LD code for gnomAD v4 SNVs/Indels #634
base: main
Are you sure you want to change the base?
Conversation
okay I'm actually okay attaching my name to this now! Ready for review |
pop_freq = pop_mt.freq[meta_index] | ||
pop_mt = pop_mt.annotate_rows(pop_freq=pop_freq) | ||
|
||
pop_mt = pop_mt.filter_rows((hl.len(pop_mt.filters) == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment: this line duplicates line 88
pop_freq = pop_mt.freq[meta_index] | ||
pop_mt = pop_mt.annotate_rows(pop_freq=pop_freq) | ||
|
||
pop_mt = pop_mt.filter_rows((hl.len(pop_mt.filters) == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a note for future reference:
In my case this line of filter is replaced with a set of filters below,
- Entries: high quality variants (~VQSR)
- Entries: adj filters
- Rows: hl.agg.any(mt.GT.n_alt_alleles() > 0)
return pop_mt | ||
|
||
|
||
def generate_ld_pruned_set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note to confirm, this function is not needed for the purpose of computing LD scores.
), | ||
overwrite, | ||
) | ||
ld = hl.ld_matrix(pop_mt.GT.n_alt_alleles(), pop_mt.locus, radius) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my old script, I ran BlockMatrix.write_from_entry_expr( mt.GT.n_alt_alleles(), tmp_bm_path, mean_impute=True, center=False, normalize=False, overwrite=args.overwrite )
, wondering how much difference this will introduce
) | ||
ld = hl.ld_matrix(pop_mt.GT.n_alt_alleles(), pop_mt.locus, radius) | ||
if data_type != "genomes_snv_sv": | ||
ld = ld.sparsify_triangle() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: any thoughts in why this shouldn't be applied to all cases?
|
||
l2row = r2_adj.sum(axis=0).T | ||
l2col = r2_adj.sum(axis=1) | ||
l2 = l2row + l2col + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more note, I had this line as
r2_diag = checkpoint_tmp(r2_adj.diagonal()).T
l2 = l2row + l2col - r2_diag
Ported code from gnomAD v2. Not running on SVs or for cross-pop analyses