Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about reading variants #45

Open
HelloWorldLTY opened this issue Sep 12, 2024 · 3 comments
Open

Questions about reading variants #45

HelloWorldLTY opened this issue Sep 12, 2024 · 3 comments

Comments

@HelloWorldLTY
Copy link

Hi, thanks for your great work. Do you now support loading the variant from vcf files and filter the variants based on vaf, dp, dq, etc? Thanks a lot.

@HelloWorldLTY
Copy link
Author

Also, I wonder if it is possible to read variants like inserting rather than replacement. It seems that the current design cannot handle alternative with different length.

File ~/.conda/envs/evo/lib/python3.11/site-packages/grelu/data/dataset.py:599, in VariantDataset._load_alleles(self, variants)
    597 def _load_alleles(self, variants: pd.DataFrame) -> None:
    598     self.ref = strings_to_indices(variants.ref.tolist())
--> 599     self.alt = strings_to_indices(variants.alt.tolist())

File ~/.conda/envs/evo/lib/python3.11/site-packages/grelu/sequence/format.py:251, in strings_to_indices(strings, add_batch_axis)
    247         return arr
    249 # Convert multiple sequences; they must all have equal length
    250 else:
--> 251     assert check_equal_lengths(
    252         strings
    253     ), "All input sequences must have the same length."
    254     return np.stack(
    255         [[BASE_TO_INDEX_HASH[base] for base in string] for string in strings]
    256     ).astype(np.int8)

AssertionError: All input sequences must have the same length.

Thanks a lot.

@avantikalal
Copy link
Collaborator

Hi @HelloWorldLTY, thanks for raising these points. We do not currently support VCF reading or indels, but we are working on indel support and hope to add it soon.

@HelloWorldLTY
Copy link
Author

Thanks, the current best plan I have is to iteratively assign different calling object vr for each sequence and map multiple inserts. It will be very helpful to have such functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants