-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Htsfree #4
base: master
Are you sure you want to change the base?
Conversation
Modified behaviour of discard for paired-end data.
Well, so much for the However, this doesn't work when you ask for paired reads by chromosome, because We can get around it by |
Does setting isProperPair = FALSE in the bamFlags not preclude that
problem? Put a different way, are there aligners out there these days that
would set the 0x2 bit if the two reads align to different chromosomes?
…On Sun, Mar 31, 2019 at 4:37 AM Aaron Lun ***@***.***> wrote:
Well, so much for the BamFile idea. To recap, the hope was that we could
open the BamFile at the start of the calling function, and pass the
resulting object to the BAM file-reading functions. This would avoid the
overhead of setting up the BAM file handle at every iteration of a file
read.
However, this doesn't work when you ask for paired reads by chromosome,
because readGAlignmentPairs will also search for mates on other
chromosomes - forcing the file pointer forwards and skipping the other
chromosomes entirely when the calling function loops to it .
We can get around it by open and closeing the BamFile at every
per-chromosome iteration, but I wonder if this would defeat the performance
benefit of using a BamFile in the first place... @mtmorgan
<https://github.com/mtmorgan>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFqmvOfcpS2GUvorJkOhNYZCzOzrDAipks5vcHOqgaJpZM4cUFqZ>
.
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
I had thought about that, but was unwilling to rely on the aligner's definition of what a proper pair was. Notwithstanding inter-chromosomal pairs, different aligners might use different metrics for defining a proper pair - the maximum allowable fragment length is one such parameter that comes to mind. If we had to do some filtering, the |
What is the INS field? I don't see that in the SAM spec.
…On Mon, Apr 1, 2019 at 10:05 PM Aaron Lun ***@***.***> wrote:
I had thought about that, but was unwilling to rely on the aligner's
definition of what a proper pair was. Notwithstanding inter-chromosomal
pairs, different aligners might use different metrics for defining a proper
pair - the maximum allowable fragment length is one such parameter that
comes to mind.
If we had to do some filtering, the INS field would be much more standard
for getting rid of inter-chromosomal pairs (as well as large fragments), as
mentioned in Bioconductor/GenomicAlignments#4
<Bioconductor/GenomicAlignments#4>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFqmvGpqAnB4xLEJP7uVYhDBYYG8ghjtks5vcrq3gaJpZM4cUFqZ>
.
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
Oops! Well spotted, I got my names mixed up. I was referring to |
The GAlignmentPairs object will have NA values for any cross-chromosomal
pairs:
z <- readGAlignmentPairs(bf, param = param)
z
GAlignmentPairs object with 9247 pairs, strandMode=1, and 0 metadata
columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
[1] chr1 - : 3000136-3000181 -- 3000136-3000182
[2] chr1 + : 3000275-3000374 -- 3000460-3000560
[3] chr1 + : 3000387-3000487 -- 3000457-3000556
[4] chr1 + : 3000399-3000499 -- 3000490-3000590
[5] chr1 + : 3000535-3000635 -- 3000784-3000884
... ... ... ... ... ... ...
[9243] <NA> - : 130880221-130880321 -- 3771943-3771982
[9244] <NA> - : 5571716-5571782 -- 3087890-3087990
[9245] <NA> - : 14742478-14742542 -- 3802450-3802550
[9246] <NA> - : 41215388-41215454 -- 3507909-3507944
[9247] <NA> * : 20659260-20659360 -- 3949244-3949310
-------
seqinfo: 66 sequences from an unspecified genome
Those last five are cross-chromosomal pairs
grglist(z[9246:9247,])
GRangesList object of length 2:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr7 41215388-41215454 -
[2] chr1 3507909-3507944 -
[[2]]
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
[1] chr8 20659260-20659360 +
[2] chr1 3949244-3949310 -
So you could hypothetically just read in the whole chromosome, and dump out
the cross-chromosomal reads, then filter by fragment size
z <- z[!is.na(seqnames(z)),]
z <- granges(z)
z <- z[width(z) < 400,]
z
GRanges object with 8922 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 3000136-3000182 -
[2] chr1 3000275-3000560 +
[3] chr1 3000387-3000556 +
[4] chr1 3000399-3000590 +
[5] chr1 3000535-3000884 +
... ... ... ...
[8918] chr1 3999320-3999403 -
[8919] chr1 3999323-3999545 +
[8920] chr1 3999547-3999693 -
[8921] chr1 3999620-3999962 -
[8922] chr1 3999935-4000021 +
-------
And that's all pretty fast, I think.
…On Tue, Apr 2, 2019 at 10:14 AM Aaron Lun ***@***.***> wrote:
Oops! Well spotted, I got my names mixed up. I was referring to TLEN,
which is known to *(R)samtools* as ISIZE (not entirely sure why those two
have different names, but there we go).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFqmvLoLgWAVc7L1YSiyQ6PEjE6fyC8eks5vc2WJgaJpZM4cUFqZ>
.
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
Or there is the super-secret argument on.discordant.seqnames, known only to
Herve and some Mossad agents:
bf <- BamFile("tmp_sorted.bam", asMates =TRUE)
param <- ScanBamParam(which = GRanges("chr1:3000000-4000000"))
z <- readGAlignmentPairs(bf, param = param)
zz <- granges(z, on.discordant.seqnames = "drop")
zz <- zz[width(zz) < 400,]
zz
GRanges object with 8922 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 3000136-3000182 -
[2] chr1 3000275-3000560 +
[3] chr1 3000387-3000556 +
[4] chr1 3000399-3000590 +
[5] chr1 3000535-3000884 +
... ... ... ...
[8918] chr1 3999320-3999403 -
[8919] chr1 3999323-3999545 +
[8920] chr1 3999547-3999693 -
[8921] chr1 3999620-3999962 -
[8922] chr1 3999935-4000021 +
-------
…On Tue, Apr 2, 2019 at 10:56 AM James W. MacDonald ***@***.***> wrote:
The GAlignmentPairs object will have NA values for any cross-chromosomal
pairs:
> z <- readGAlignmentPairs(bf, param = param)
> z
GAlignmentPairs object with 9247 pairs, strandMode=1, and 0 metadata
columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
[1] chr1 - : 3000136-3000181 -- 3000136-3000182
[2] chr1 + : 3000275-3000374 -- 3000460-3000560
[3] chr1 + : 3000387-3000487 -- 3000457-3000556
[4] chr1 + : 3000399-3000499 -- 3000490-3000590
[5] chr1 + : 3000535-3000635 -- 3000784-3000884
... ... ... ... ... ... ...
[9243] <NA> - : 130880221-130880321 -- 3771943-3771982
[9244] <NA> - : 5571716-5571782 -- 3087890-3087990
[9245] <NA> - : 14742478-14742542 -- 3802450-3802550
[9246] <NA> - : 41215388-41215454 -- 3507909-3507944
[9247] <NA> * : 20659260-20659360 -- 3949244-3949310
-------
seqinfo: 66 sequences from an unspecified genome
Those last five are cross-chromosomal pairs
> grglist(z[9246:9247,])
GRangesList object of length 2:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr7 41215388-41215454 -
[2] chr1 3507909-3507944 -
[[2]]
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
[1] chr8 20659260-20659360 +
[2] chr1 3949244-3949310 -
So you could hypothetically just read in the whole chromosome, and dump
out the cross-chromosomal reads, then filter by fragment size
> z <- z[!is.na(seqnames(z)),]
> z <- granges(z)
> z <- z[width(z) < 400,]
> z
GRanges object with 8922 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 3000136-3000182 -
[2] chr1 3000275-3000560 +
[3] chr1 3000387-3000556 +
[4] chr1 3000399-3000590 +
[5] chr1 3000535-3000884 +
... ... ... ...
[8918] chr1 3999320-3999403 -
[8919] chr1 3999323-3999545 +
[8920] chr1 3999547-3999693 -
[8921] chr1 3999620-3999962 -
[8922] chr1 3999935-4000021 +
-------
And that's all pretty fast, I think.
On Tue, Apr 2, 2019 at 10:14 AM Aaron Lun ***@***.***>
wrote:
> Oops! Well spotted, I got my names mixed up. I was referring to TLEN,
> which is known to *(R)samtools* as ISIZE (not entirely sure why those
> two have different names, but there we go).
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#4 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AFqmvLoLgWAVc7L1YSiyQ6PEjE6fyC8eks5vc2WJgaJpZM4cUFqZ>
> .
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
To be clear, my concern isn't about filtering out inter-chromosomal read pairs in memory; it's about avoiding them being read into memory at all. The current state of this PR does memory-level filtering, but it should theoretically be possible to get better performance by skipping them during I/O. Of course, if the current PR status is sufficiently fast for your use cases, I'll merge. I can't really check until I get my new laptop - 2GB of RAM is not enough for genomics these days. |
Well... this is disappointing. library(Rsamtools)
bf <- system.file("exdata", "rep1.bam", package="csaw")
H <- scanBamHeader(bf)[[1]]$targets
H
## chrA chrB chrC
## 1298 870 1345
handle <- BamFile(bf)
open(handle)
scanBam(handle, param=ScanBamParam(what="pos", which=GRanges("chrA", IRanges(1, H[1]))))
## $`chrA:1-1298`
## $`chrA:1-1298`$pos
## [1] 3 4 6 8 10 12 12 12 13 13 17 20 21 23 26
## ... etc.
scanBam(handle, param=ScanBamParam(what="pos", which=GRanges("chrB", IRanges(1, H[2]))))
## $`chrB:1-870`
## $`chrB:1-870`$pos
## integer(0)
scanBam(handle, param=ScanBamParam(what="pos", which=GRanges("chrC", IRanges(1, H[3]))))
## $`chrC:1-1345`
## $`chrC:1-1345`$pos
## integer(0)
close(handle) As you can see, trying to retrieve reads on an open BAM file handle that's already been searched by position... doesn't work, even if the ensuing calls refer to reads that should occur later in the file. |
Construct the GRanges up-front https://support.bioconductor.org/p/119631/#119632 ? I guess the costs of opening a file are parsing the index and seeking the position; opening the BamFile once and using GRanges saves the cost of index parsing. One could also probably implement a |
Is the goal to be able to convert to using GenomicAlignments with as little
disruption to your existing code base as possible?
Or is the goal to limit ongoing memory usage while reading in data?
If the latter, I think iterating through the bam file using yieldSize, and
at each iteration converting to counts per window would limit the total
memory required, but at the expense of requiring other changes to your code
base.
…On Fri, Apr 5, 2019, 8:20 AM Martin Morgan ***@***.***> wrote:
Construct the GRanges up-front?
https://support.bioconductor.org/p/119631/#119632
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFqmvLkpCgLWXo3JY8xA6PVFvaaruPFmks5vdz9ZgaJpZM4cUFqZ>
.
|
Never mind, I see that it's the former.
…On Fri, Apr 5, 2019, 10:06 AM James W. MacDonald ***@***.***> wrote:
Is the goal to be able to convert to using GenomicAlignments with as
little disruption to your existing code base as possible?
Or is the goal to limit ongoing memory usage while reading in data?
If the latter, I think iterating through the bam file using yieldSize, and
at each iteration converting to counts per window would limit the total
memory required, but at the expense of requiring other changes to your code
base.
On Fri, Apr 5, 2019, 8:20 AM Martin Morgan ***@***.***>
wrote:
> Construct the GRanges up-front?
> https://support.bioconductor.org/p/119631/#119632
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#4 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AFqmvLkpCgLWXo3JY8xA6PVFvaaruPFmks5vdz9ZgaJpZM4cUFqZ>
> .
>
|
Also, @martinmorgan, using a GRanges of different lengths (like
by-chromosome) seems to suffer from the same problem.
…On Fri, Apr 5, 2019, 10:18 AM James W. MacDonald ***@***.***> wrote:
Never mind, I see that it's the former.
On Fri, Apr 5, 2019, 10:06 AM James W. MacDonald ***@***.***> wrote:
> Is the goal to be able to convert to using GenomicAlignments with as
> little disruption to your existing code base as possible?
>
> Or is the goal to limit ongoing memory usage while reading in data?
>
> If the latter, I think iterating through the bam file using yieldSize,
> and at each iteration converting to counts per window would limit the total
> memory required, but at the expense of requiring other changes to your code
> base.
>
> On Fri, Apr 5, 2019, 8:20 AM Martin Morgan ***@***.***>
> wrote:
>
>> Construct the GRanges up-front?
>> https://support.bioconductor.org/p/119631/#119632
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#4 (comment)>, or mute
>> the thread
>> <https://github.com/notifications/unsubscribe-auth/AFqmvLkpCgLWXo3JY8xA6PVFvaaruPFmks5vdz9ZgaJpZM4cUFqZ>
>> .
>>
>
|
@jmacdon can you describe what you mean at Bioconductor/Rsamtools#6 ? |
Thanks @mtmorgan, the below seems to work for me: library(Rsamtools)
bf <- system.file("exdata", "rep1.bam", package="csaw")
H <- scanBamHeader(bf)[[1]]$targets
H
## chrA chrB chrC
## 1298 870 1345
handle <- BamFile(bf, yieldSize=1)
all.ref <- GRanges(names(H), IRanges(1, H))
param <- ScanBamParam(what="pos", which=all.ref)
open(handle)
scanBam(handle, param=param)
scanBam(handle, param=param)
scanBam(handle, param=param)
close(handle) @jmacdon; yes, the idea would be to just swap in the existing read input functions with GenomicAlignments, without having to rewrite a whole lot of the surrounding context: while still preserving, as much as possible, the current performance characteristics. We're almost there; the performance degradation is acceptable for standard applications, it's just this scaffold case that sucks. |
Yes, that works for simple queries, but not for readGAlignmentPairs, which
I imagine is the most common use case these days:
b <-
BamFile("aligned_nodups_20190130_acomys/1-D-1Aligned.sortedByCoord.out.bam",
yieldSize = 1L, asMates = TRUE)
open(b)
repeat{
+ aln <- readGAlignmentPairs(b, param = param)
+ if(length(aln) == 0L)
+ break
+ print(head(table(seqnames(aln)), 2))
+ }
LAS1 LAS2
64280 0
Warning message:
In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, :
26 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
…On Fri, Apr 5, 2019 at 11:30 AM Aaron Lun ***@***.***> wrote:
Thanks @mtmorgan <https://github.com/mtmorgan>, the below seems to work
for me:
library(Rsamtools)bf <- system.file("exdata", "rep1.bam", package="csaw")H <- scanBamHeader(bf)[[1]]$targetsH## chrA chrB chrC## 1298 870 1345
handle <- BamFile(bf, yieldSize=1)all.ref <- GRanges(names(H), IRanges(1, H))param <- ScanBamParam(what="pos", which=all.ref)
open(handle)
scanBam(handle, param=param)
scanBam(handle, param=param)
scanBam(handle, param=param)
close(handle)
@jmacdon <https://github.com/jmacdon>; yes, the idea would be to just
swap in the existing read input functions with *GenomicAlignments*,
without having to rewrite a whole lot of the surrounding context: *while
still preserving, as much as possible, the current performance
characteristics*. We're almost there; the performance degradation is
acceptable for standard applications, it's just this scaffold case that
sucks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFqmvPuIpaU9lS__m7QLkX2udlOOFy-Gks5vd2rGgaJpZM4cUFqZ>
.
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
Legit...
pe.param.acomys <- readParam(max.frag = 400,minq = 200, BPPARAM =
MulticoreParam(10))
system.time(windowCounts(asamps$files, ext = 250, width = 150, spacing =
75, param = pe.param.acomys))
user system elapsed
11075.76 19712.78 13859.93
I ran that under the assumption that I didn't need to use bpstart on the
BPPARAM object, but maybe I did need to?
…On Fri, Apr 5, 2019 at 11:30 PM Aaron Lun ***@***.***> wrote:
@jmacdon <https://github.com/jmacdon> One more roll of the dice. I've
just switched windowCounts to use the scheme suggested by @mtmorgan
<https://github.com/mtmorgan>; can you see how fast it runs on your
scaffolds in single-end mode?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFqmvO2H6e9UmegETOZYAFdlYh-etnFvks5veBSUgaJpZM4cUFqZ>
.
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
Ugh. Never mind. That was the release version. Re-running now.
…On Tue, Apr 9, 2019 at 9:22 AM James W. MacDonald ***@***.***> wrote:
Legit...
> pe.param.acomys <- readParam(max.frag = 400,minq = 200, BPPARAM =
MulticoreParam(10))
> system.time(windowCounts(asamps$files, ext = 250, width = 150, spacing =
75, param = pe.param.acomys))
user system elapsed
11075.76 19712.78 13859.93
I ran that under the assumption that I didn't need to use bpstart on the
BPPARAM object, but maybe I did need to?
On Fri, Apr 5, 2019 at 11:30 PM Aaron Lun ***@***.***>
wrote:
> @jmacdon <https://github.com/jmacdon> One more roll of the dice. I've
> just switched windowCounts to use the scheme suggested by @mtmorgan
> <https://github.com/mtmorgan>; can you see how fast it runs on your
> scaffolds in single-end mode?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#4 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AFqmvO2H6e9UmegETOZYAFdlYh-etnFvks5veBSUgaJpZM4cUFqZ>
> .
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
... is it still running? |
Yes. Without using bpstart(), which I assume is superfluous at this point,
as it seems you use that internally.
…On Tue, Apr 9, 2019 at 11:00 PM Aaron Lun ***@***.***> wrote:
... is it still running?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFqmvKJaBoqCtYnLhwZ22aYGDuN5G3Arks5vfVOAgaJpZM4cUFqZ>
.
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
Wow. Still running this morning...
…On Tue, Apr 9, 2019 at 11:52 PM James W. MacDonald ***@***.***> wrote:
Yes. Without using bpstart(), which I assume is superfluous at this point,
as it seems you use that internally.
On Tue, Apr 9, 2019 at 11:00 PM Aaron Lun ***@***.***>
wrote:
> ... is it still running?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#4 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AFqmvKJaBoqCtYnLhwZ22aYGDuN5G3Arks5vfVOAgaJpZM4cUFqZ>
> .
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
Possibly helpful:
system.time(windowCounts(asamps$files, ext = 250, width = 150, spacing =
75, param = pe.param.acomys, BPPARAM = MulticoreParam(10)))
C-c C-c
Warning messages:
1: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
2: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
3: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
4: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
5: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
6: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
7: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
8: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
9: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
Timing stopped at: 527.6 64.32 6.355e+04
…On Wed, Apr 10, 2019 at 9:32 AM James W. MacDonald ***@***.***> wrote:
Wow. Still running this morning...
On Tue, Apr 9, 2019 at 11:52 PM James W. MacDonald ***@***.***> wrote:
> Yes. Without using bpstart(), which I assume is superfluous at this
> point, as it seems you use that internally.
>
> On Tue, Apr 9, 2019 at 11:00 PM Aaron Lun ***@***.***>
> wrote:
>
>> ... is it still running?
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#4 (comment)>, or mute
>> the thread
>> <https://github.com/notifications/unsubscribe-auth/AFqmvKJaBoqCtYnLhwZ22aYGDuN5G3Arks5vfVOAgaJpZM4cUFqZ>
>> .
>>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
And for completeness
sessionInfo()
R Under development (unstable) (2019-03-19 r76252)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /data/oldR/R-devel/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-devel/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] csaw_1.17.8 SummarizedExperiment_1.13.0
[3] DelayedArray_0.9.9 BiocParallel_1.17.18
[5] matrixStats_0.54.0 Biobase_2.43.1
[7] GenomicRanges_1.35.1 GenomeInfoDb_1.19.2
[9] IRanges_2.17.4 S4Vectors_0.21.21
[11] BiocGenerics_0.29.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 compiler_3.6.0 XVector_0.23.2
[4] prettyunits_1.0.2 GenomicFeatures_1.35.9 bitops_1.0-6
[7] tools_3.6.0 zlibbioc_1.29.0 progress_1.2.0
[10] biomaRt_2.39.2 digest_0.6.18 bit_1.1-14
[13] RSQLite_2.1.1 memoise_1.1.0 lattice_0.20-38
[16] pkgconfig_2.0.2 rlang_0.3.3 Matrix_1.2-17
[19] DBI_1.0.0 GenomeInfoDbData_1.2.0 rtracklayer_1.43.3
[22] httr_1.4.0 stringr_1.4.0 hms_0.4.2
[25] Biostrings_2.51.5 locfit_1.5-9.1 bit64_0.9-7
[28] grid_3.6.0 R6_2.4.0
AnnotationDbi_1.45.1
[31] XML_3.98-1.19 limma_3.39.14 edgeR_3.25.3
[34] magrittr_1.5 blob_1.1.1 Rsamtools_1.99.4
[37] GenomicAlignments_1.19.1 assertthat_0.2.1 stringi_1.4.3
[40] RCurl_1.95-4.12 crayon_1.3.4
…
On Wed, Apr 10, 2019 at 10:29 AM James W. MacDonald ***@***.***> wrote:
Possibly helpful:
> system.time(windowCounts(asamps$files, ext = 250, width = 150, spacing =
75, param = pe.param.acomys, BPPARAM = MulticoreParam(10)))
C-c C-c
Warning messages:
1: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
2: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
3: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
4: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
5: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
6: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
7: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
8: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
9: In totals[bf] + bp.out[[bf]]$totals : NAs produced by integer overflow
Timing stopped at: 527.6 64.32 6.355e+04
On Wed, Apr 10, 2019 at 9:32 AM James W. MacDonald ***@***.***> wrote:
> Wow. Still running this morning...
>
> On Tue, Apr 9, 2019 at 11:52 PM James W. MacDonald ***@***.***>
> wrote:
>
>> Yes. Without using bpstart(), which I assume is superfluous at this
>> point, as it seems you use that internally.
>>
>> On Tue, Apr 9, 2019 at 11:00 PM Aaron Lun ***@***.***>
>> wrote:
>>
>>> ... is it still running?
>>>
>>> —
>>> You are receiving this because you were mentioned.
>>> Reply to this email directly, view it on GitHub
>>> <#4 (comment)>, or mute
>>> the thread
>>> <https://github.com/notifications/unsubscribe-auth/AFqmvKJaBoqCtYnLhwZ22aYGDuN5G3Arks5vfVOAgaJpZM4cUFqZ>
>>> .
>>>
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
|
Oh geez. Let me double-check my code, maybe I did something wrong. |
Well, I don't think I stuffed anything up. My small examples don't show any difference between this branch and |
getPESizes
is not functional right now.