Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Simphenotype and Index Repeat Support #209

Merged
merged 23 commits into from
Apr 25, 2023
Merged

Conversation

mlamkin7
Copy link
Collaborator

We've updated the simphenotype and index subcommands to support a new line type in the hap file "R".
R stands for repeats

Usage in a sorted hap file (tests/data/basic.hap.gz):

#       version 0.1.0
H       21      26928472        26941960        chr21.q.3365*1
R       21      26938353        26938400        21_26938353_STR
H       21      26938353        26938989        chr21.q.3365*11
H       21      26938989        26941960        chr21.q.3365*10
R       21      26939000        26939010        21_26938989_STR
R       21      26941880        26941900        21_26941880_STR
V       chr21.q.3365*1  26928472        26928472        21_26928472_C_A C
V       chr21.q.3365*1  26938353        26938353        21_26938353_T_C T
V       chr21.q.3365*1  26940815        26940815        21_26940815_T_C C
V       chr21.q.3365*1  26941960        26941960        21_26941960_A_G G
V       chr21.q.3365*10 26938989        26938989        21_26938989_G_A A
V       chr21.q.3365*10 26940815        26940815        21_26940815_T_C T
V       chr21.q.3365*10 26941960        26941960        21_26941960_A_G A
V       chr21.q.3365*11 26938353        26938353        21_26938353_T_C T
V       chr21.q.3365*11 26938989        26938989        21_26938989_G_A A

Along with these changes are additional changes in simphenotypes PhenoSimulator class particularly the run() function which now instead of taking in a list of haplotypes takes in the full Haplotypes object as well as the IDs of haplotypes and repeats to extract betas and genotypes.

To use repeats in simphenotype, use the additional --repeats option.
Example:

haptools simphenotype --repeats repeats.vcf snps.vcf snps_and_repeats.hap

Note in the example SNPs must also still be present, so we cannot simulate based on repeats alone.

@mlamkin7 mlamkin7 requested a review from aryarm April 21, 2023 19:18
Copy link
Member

@aryarm aryarm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks great! Thanks for taking all of this on, @mlamkin7
I think this will be super useful for me and for others in the lab! I can't wait to use it for happler

Most of my comments are for refactoring things a bit to make it easier to add new effects to simphenotype in the future, but we don't necessarily need to try to do that all now either

Also, do we have a test to check whether things will still work if you specify a mix of repeat and haplotype IDs via simphenotype's --ids parameter?

haptools/data/haplotypes.py Outdated Show resolved Hide resolved
haptools/sim_phenotype.py Outdated Show resolved Hide resolved
haptools/sim_phenotype.py Outdated Show resolved Hide resolved
docs/formats/haplotypes.rst Show resolved Hide resolved
haptools/data/haplotypes.py Outdated Show resolved Hide resolved
haptools/data/haplotypes.py Outdated Show resolved Hide resolved
haptools/data/haplotypes.py Outdated Show resolved Hide resolved
haptools/sim_phenotype.py Outdated Show resolved Hide resolved
haptools/sim_phenotype.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants