Skip to content
csw edited this page Jun 28, 2012 · 4 revisions

Since MAF files can be hundreds of GB, performance is an important consideration. So far, the chunked parser appears to have very competitive performance. Times for parsing a 315 MB file and counting MAF blocks:

  • chunked parser: 10.1 s
  • line-based parser: 16.0 s
  • bx-python parser: 22.7 s
  • PHAST: <= 18.3 s (not strictly comparable, was writing MAF output also)

Also, JRuby 1.7 on Java 7 appears to be almost twice as fast at MAF parsing as CRuby 1.9.3, after it warms up, averaging 16 µs per alignment block compared to 25 µs.

JRuby configuration

Disabling JRuby's ObjectProxyCache with the -Xji.objectProxyCache=false option gives a massive performance gain (about 2.5x in my testing) for multithreaded index scans by eliminating lock contention.

[bx-python]
$ time maf_count.py < ~/maf/chrY.maf
95437

real	0m23.136s
user	0m22.685s
sys	0m0.390s
[bio-maf, chunked parser]
$ time bin/maf_count --parser ChunkParser ~/maf/chrY.maf
Parsed 95437 MAF alignment blocks.

real	0m10.481s
user	0m10.140s
sys	0m0.249s
[bio-maf, original parser]
$ time bin/maf_count ~/maf/chrY.maf 
Parsed 95437 MAF alignment blocks.

real	0m16.445s
user	0m16.003s
sys	0m0.285s
[PHAST]
$ time maf_parse ~/maf/chrY.maf > /dev/null

real	0m18.607s
user	0m18.325s
sys	0m0.255s

[MRI vs. JRuby]
$ ruby -v
jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04) [darwin-x86_64-java]
$ bin/maf_parse_bench -w ~/maf/chrY.maf 
  0.000016   0.000000   0.000016 (  0.000015)
=========================
$ ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.3.0]
$ bin/maf_parse_bench -w ~/maf/chrY.maf
  0.000025   0.000000   0.000025 (  0.000026)
Clone this wiki locally