Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data from BigWig files #13

Open
Carldeboer opened this issue Jan 5, 2017 · 1 comment
Open

Missing data from BigWig files #13

Carldeboer opened this issue Jan 5, 2017 · 1 comment

Comments

@Carldeboer
Copy link

I have a program that uses bx-python to extract data from a bigwig file for specific BED loci. For some reason certain chromosomes are sometimes skipped for having no data (BigWigFile.get_as_array( chrom, startPos, endPos ) returns None). Looking at the contents of the bigwig file manually or on IGV indicates that there is in fact data for the "missing" chromosomes:
This is an example peak that is being skipped:
chr6 157355 157505 chr6:157355-157505
Same region on IGV:
image

Looking at this region on the command line:

$ bigWigToBedGraph BWs/B_30.q30.bw  temp.out -chrom=chr6 -start=157355 -end=157505
$ cat temp.out
chr6	157361	157362	2
chr6	157363	157364	2
chr6	157372	157373	1
chr6	157388	157389	1
chr6	157392	157393	1
chr6	157403	157404	1
chr6	157413	157414	1
chr6	157415	157416	1
chr6	157419	157420	1
chr6	157424	157425	1
chr6	157432	157435	1
chr6	157439	157440	1
chr6	157443	157444	1
chr6	157477	157478	1
chr6	157491	157492	1
chr6	157493	157494	2
chr6	157504	157505	1

The following code is enough to reproduce the issue:

import bx
print(bx.__file__)
from bx.bbi.bigwig_file import BigWigFile
inFile1 = BigWigFile(open("BWs/B_30.q30.bw"))
values = inFile1.get_as_array( "chr6", 157355, 157505 )
print(values)

and produces this output:

$HOME/.local/lib/python2.7/site-packages/bx_python-0.8.0-py2.7-linux-x86_64.egg/bx/__init__.pyc
None

Note that None appears to only be returned when a chromosome is not found; when the chromsome is found and there is no data for the region, it returns a list of nans.

I can provide the BW if that would be helpful. I need somewhere to put it (150MB).

@krejciadam
Copy link

Just encountered the same issue. I think the problem is with chromosome count: I have a hg38 bigwig file from UCSC. This file contains all the alternative chromosomes like chr4_KI270925v1_alt. bxpython's get_as_array() "sees" only chromosomes 1 to 5 and then 10 to 22. Investigating this a little more carefully, it shows up that if chromosomes are alphabetically ordered, bxpython only "sees" exactly the first 256 chromosomes - it finds everything up to
"chr5_GL383532v1_alt" (256th chromosome)
and does not see anything from
"chr5_GL949742v1_alt" (257th chromosome)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants