Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Latest commit

 

History

History
88 lines (57 loc) · 3.61 KB

q20.md

File metadata and controls

88 lines (57 loc) · 3.61 KB

Q20+ data model

A model trained on Q20+ data (R10.4.1 flow cells with kit14 chemistry) for Clair3-Trio.

Contents


How to use the Q20+ model

INPUT_DIR="[YOUR_INPUT_FOLDER]"            # e.g. /input
REF=${_INPUT_DIR}/ref.fa                   # change your reference file name here
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]"          # e.g. /output
THREADS="[MAXIMUM_THREADS]"                # e.g. 8
MODEL_C3="r1041_e82_400bps_sup_v400"         	# q20 Clair3 model
MODEL_C3T="c3t_hg002_dna_r1041_e82_400bps_sup"  # q20 Clair3-Trio model


docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3-trio:latest \
  /opt/bin/run_clair3_trio.sh \
  --ref_fn=${INPUT_DIR}/ref.fa \                  ## change your reference file name here
  --bam_fn_c=${INPUT_DIR}/child_input.bam \       ## change your child's bam file name here 
  --bam_fn_p1=${INPUT_DIR}/parent1_input.bam \    ## change your parent1's bam file name here     
  --bam_fn_p2=${INPUT_DIR}/parent2_input.bam \   ## change your parent2's bam file name here   
  --sample_name_c=${SAMPLE_C} \                   ## change your child's name here
  --sample_name_p1=${SAMPLE_P1} \                 ## change your parent1's name here
  --sample_name_p2=${SAMPLE_P2} \                 ## change your parent2's name here
  --threads=${THREADS} \                          ## maximum threads to be used
  --model_path_clair3="/opt/models/clair3_models/${MODEL_C3}" \
  --model_path_clair3_trio="/opt/models/clair3_trio_models/${MODEL_C3T}" \
  --output=${OUTPUT_DIR}                          ## absolute output path prefix 

Check Usage for more options.


About Q20+

The Q20+ model are trained on data obtained from ONT (basecalled three GIAB samples (HG002, 3, 4) with fast5 available using dorado with the sup mode).

We trained the model while holding out chromosome 20 in training stages and preserving it for testing.

We compared the new model trained gainst Clair3 v1.0, r1041_e82_400bps_sup_v400 and PEPPER r0.8, ont_r10_q20.

All testing output are available at here and analysis results are in this table.

Clair3-Trio overall results


Clair3-Trio SNP results


Clair3-Trio INDEL results


Clair3-Trio overall results


Q20+ data used for training and testing

FASTQs and BAMs

download data from here

FAST5s

Sample Reference Aligner Coverage Basecaller Chemistry
HG002 GRCh38_no_alt minimap2 70.13 Dorado v4.0.0 SUP R10.4.1 E8.2
HG003 GRCh38_no_alt minimap2 79.66 Dorado v4.0.0 SUP R10.4.1 E8.2
HG004 GRCh38_no_alt minimap2 64.72 Dorado v4.0.0 SUP R10.4.1 E8.2

We trained the model while holding out chromosome 20 in training stages and preserving it for testing.