-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SortSamSpark Required array length is too large #8949
Comments
Do you really need almost 2 terabytes of heap space? |
You've misunderstood the issue. My computer has 2TB of memory, so |
@fo40225 that's interesting. The original file contains the reads that cause the filtered file to fail? I would have expected it to fail in both cases if it was an issue with the read lengths. @gokalpcelik is correct about the Spark tools - they're |
This is definitely a bug with the way serialization is handled, but it's hard to tell where the issue is exactly. Spark is trying to serialize something into a byte buffer, but it's trying to put more bytes in than fit in a java array. If you could produce a very small bam file that reliably reproduces this problem we might be able to investigate it, but I don't have bandwidth to really look into this right now. Spark tools are a low priority at the moment. I would recommend sorting the file with the non-spark SortSam for now. I'm sorry I don't have a better answer, but dealing with serialization issues is very often a huge can of worms. |
@jonn-smith The original BAM (containing short reads) will run normally. The filtered BAM (containing only long reads) will crash. @lbergelson Is there a way to keep the file in |
Bug Report
Affected tool(s) or class(es)
Tool/class name(s), special parameters?
SortSamSpark --sort-order coordinate
Affected version(s)
4.4.0.0
Description
Describe the problem below. Provide screenshots , stacktrace , logs where appropriate.
An error occurs when using SortSamSpark to sort the large BAM file that contain long reads only (90x human wgs, min. read length>10kbp).
However, if the large BAM file contains short reads, it executes normally.
Steps to reproduce
Tell us how to reproduce this issue. If possible, include command lines that reproduce the problem. (The support team may follow up to ask you to upload data to reproduce the issue.)
Expected behavior
Tell us what should happen
Output a sorted BAM file.
Actual behavior
Tell us what happens instead
java.lang.OutOfMemoryError: Required array length ? is too large
The last lines of the log file.
The first lines of the log file:
The text was updated successfully, but these errors were encountered: