Step 6 tmp/phase_output/phase_bam/.bam no found error #23

aragornwubo · 2021-06-04T19:51:03Z

Hi,

Thank you for developing the Clair3.

I have met the same unexpected error either running Clair3 using the singularity image or the one installed via the conda method.
The command I used:

            run_clair3.sh \
            --bam_fn=${NANO_BAM} \
            --ref_fn=${REF} \
            --threads=32 \
            --platform="ont" \
            --model_path="./models/ont" \
            --output=${BASE}/CLAIR3_CONDA \
            --sample_name='HG002' \
            --chunk_size=10000000 \
            --include_all_ctgs

The error information of the following pattern occurred multiple times at the end the running:

            [INFO] 6/7 Calling variants using Full Alignment
            [ERROR] file /scratch1/bwu4/NEW_XIAO/CLAIR3_CONDA/tmp/phase_output/phase_bam/.bam not found
            parallel: This job failed:
            python3 /home/bwu4/bin/Clair3/scripts/../clair3.py CallVarBam     --chkpnt_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/./models/ont/full_alignment     --bam_fn /scratch1/bwu4/N                                           EW_XIAO/CLAIR3_CONDA/tmp/phase_output/phase_bam/''.bam     --call_fn /scratch1/bwu4/NEW_XIAO/CLAIR3_CONDA/tmp/full_alignment_output/full_alignment_''.vcf     --sampleName HG002                                                --vcf_fn EMPTY     --ref_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/HG002_SHA_RAG.fasta     --full_aln_regions ''     --ctgName ''     --add_indel_length     --phasing_info_                                           in_bam     --gvcf False     --python python3     --pypy pypy3     --samtools samtools     --platform ont
            
            real    0m0.464s
            user    0m0.499s
            sys     0m0.252s
            cat: '/scratch1/bwu4/NEW_XIAO/CLAIR3_CONDA/tmp/full_alignment_output/full_alignment_*.vcf': No such file or directory
            [ERROR] No vcf file found, please check the setting

The following are the chromosomal names in my reference fasta file:

      >chr1_RagTag
      >chr10_RagTag
      >chr11_RagTag
      >chr12_RagTag
      >chr13_RagTag
      >chr14_RagTag
      >chr15_RagTag
      >chr16_RagTag
      >chr17_RagTag
      >chr18_RagTag
      >chr19_RagTag
      >chr2_RagTag
      >chr20_RagTag
      >chr21_RagTag
      >chr22_RagTag
      >chr3_RagTag
      >chr4_RagTag
      >chr5_RagTag
      >chr6_RagTag
      >chr7_RagTag
      >chr8_RagTag
      >chr9_RagTag
      >chrX_RagTag
      >chrY_RagTag

The tagged bams have been successfully generated for all 24 chromosomes. Could you help me figure out what the problem is? Thank you very much.

Best,
Bo

The text was updated successfully, but these errors were encountered:

aragornwubo · 2021-06-05T01:46:59Z

I have checked with Huangneng and this problem seems to be the same as #20.

zhengzhenxian · 2021-06-05T03:34:46Z

We reopened the issues because it might have a different cause from #20. There seems to be one empty contig name that caused an invalid bam filename in the command. For us to pinpoint the problem, could you send us the fasta index .fai file and the running log ${OUTPUT_DIR}/run_clair3.log to my email address [email protected]. Much appreciated.

aragornwubo · 2021-06-05T03:57:35Z

Thank you for replying. I noticed that there was an error in Step 1 the same as #20 in the log file after posting this problem. I also checked that the vcf files from pileup were missing for most chromosomes, that's why I closed this issue. I'm trying running with 8 threads now. I have sent the two files to your email and please check them.
To be mentioned, there is a small error in the 'run_clair3.sh' at line 215 "if [[ ${THREADS} > ${MAX_THREADS} ]]", which will set THREADS to the MAX_THREADS when I use the '--threads=8' option. I think "if [[ ${THREADS} -gt ${MAX_THREADS} ]]" should be the right version.

aragornwubo · 2021-06-06T01:10:55Z

The program seemed to run successfully with 8 threads. However, a new error was detected during the process:

parallel: This job failed:
python3 /home/bwu4/bin/Clair3/scripts/../clair3.py CallVarBam --chkpnt_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/./models/ont/full_alignment --bam_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/CLAIR3_CONDA/tmp/phase_output/phase_bam/chr15_RagTag.bam --call_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/CLAIR3_CONDA/tmp/full_alignment_output/full_alignment_chr15_RagTag.26_64.vcf --sampleName HG002 --vcf_fn EMPTY --ref_fn /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/HG002_SHA_RAG.fasta --full_aln_regions /scratch1/bwu4/NEW_XIAO/SHASTA_RAGTAG/HG002/CLAIR3_CONDA/tmp/full_alignment_output/candidate_bed/chr15_RagTag.26_64 --ctgName chr15_RagTag --add_indel_length --phasing_info_in_bam --gvcf False --python python3 --pypy pypy3 --samtools samtools --platform ont

Will this error have an effect on the final output? Thank you.

aquaskyline · 2021-06-06T01:40:21Z

Hi, if you rerun this failed command individually, would it run successfully?

aragornwubo · 2021-06-06T03:54:32Z

It runs successfully.

aquaskyline · 2021-06-06T05:49:33Z

Many thanks for the feedback. It looks like we have exceeded a system resources limitation. We are looking into the problem. What's the printout of ulimit -a in your running environment?

aragornwubo · 2021-06-07T00:01:23Z

[bwu@node0183 chr2_RagTag_clair3_filt]$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1540728
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 387973120
open files (-n) 16384
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1540728
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

aquaskyline · 2021-06-09T02:40:00Z

The reason why some jobs failed is that Clair3 was requesting more processes than the user environment allows ulimit -u. We have added more running environment checks and automatic retries in v0.1-r3.

Clair3 uses Tensorflow and pypy. These libraries open quite a few threads in each running instance. The THREADS parameter controls how many Clair3 instances can run concurrently, but each instance, as we've summarized, consumes up to 40-50 processes at peak. The number of processes a user could create is limited to a number that could be checked using ulimit -a. In an Ubuntu system, the limitation is usually over 10k (unless otherwise reduced), thus not a problem. But in RedHat or CentOS, which is commonly used in grids and institutions, the limitation is usually at 1024 or 2048, thus setting the THREADS to a number above 20 would reach the limit at some points. Setting ulimit -u to a higher number can solve the problem, but that requires the root privilege (or a blessing from the system admin team).

In v0.1-r3, we check ulimit -u and lower the THREADS accordingly. We also added automatic retries on failed jobs before handing them to users.

aragornwubo · 2021-06-09T07:43:52Z

Thank you very much. Since the problem is solved, I'm going to close this issue and try running with the new version.

aragornwubo closed this as completed Jun 5, 2021

aquaskyline reopened this Jun 5, 2021

aragornwubo closed this as completed Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 6 tmp/phase_output/phase_bam/.bam no found error #23

Step 6 tmp/phase_output/phase_bam/.bam no found error #23

aragornwubo commented Jun 4, 2021

aragornwubo commented Jun 5, 2021

zhengzhenxian commented Jun 5, 2021

aragornwubo commented Jun 5, 2021 •

edited

Loading

aragornwubo commented Jun 6, 2021

aquaskyline commented Jun 6, 2021 •

edited

Loading

aragornwubo commented Jun 6, 2021

aquaskyline commented Jun 6, 2021

aragornwubo commented Jun 7, 2021

aquaskyline commented Jun 9, 2021 •

edited

Loading

aragornwubo commented Jun 9, 2021

Step 6 tmp/phase_output/phase_bam/.bam no found error #23

Step 6 tmp/phase_output/phase_bam/.bam no found error #23

Comments

aragornwubo commented Jun 4, 2021

aragornwubo commented Jun 5, 2021

zhengzhenxian commented Jun 5, 2021

aragornwubo commented Jun 5, 2021 • edited Loading

aragornwubo commented Jun 6, 2021

aquaskyline commented Jun 6, 2021 • edited Loading

aragornwubo commented Jun 6, 2021

aquaskyline commented Jun 6, 2021

aragornwubo commented Jun 7, 2021

aquaskyline commented Jun 9, 2021 • edited Loading

aragornwubo commented Jun 9, 2021

aragornwubo commented Jun 5, 2021 •

edited

Loading

aquaskyline commented Jun 6, 2021 •

edited

Loading

aquaskyline commented Jun 9, 2021 •

edited

Loading