Q20+ data model

A model trained on Q20+ data (R10.4.1 flow cells with kit14 chemistry) for Clair3-Trio.

How to use the Q20+ model

INPUT_DIR="[YOUR_INPUT_FOLDER]"            # e.g. /input
REF=${_INPUT_DIR}/ref.fa                   # change your reference file name here
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]"          # e.g. /output
THREADS="[MAXIMUM_THREADS]"                # e.g. 8
MODEL_C3="r1041_e82_400bps_sup_v400"         	# q20 Clair3 model
MODEL_C3T="c3t_hg002_dna_r1041_e82_400bps_sup"  # q20 Clair3-Trio model


docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3-trio:latest \
  /opt/bin/run_clair3_trio.sh \
  --ref_fn=${INPUT_DIR}/ref.fa \                  ## change your reference file name here
  --bam_fn_c=${INPUT_DIR}/child_input.bam \       ## change your child's bam file name here 
  --bam_fn_p1=${INPUT_DIR}/parent1_input.bam \    ## change your parent1's bam file name here     
  --bam_fn_p2=${INPUT_DIR}/parent2_input.bam \   ## change your parent2's bam file name here   
  --sample_name_c=${SAMPLE_C} \                   ## change your child's name here
  --sample_name_p1=${SAMPLE_P1} \                 ## change your parent1's name here
  --sample_name_p2=${SAMPLE_P2} \                 ## change your parent2's name here
  --threads=${THREADS} \                          ## maximum threads to be used
  --model_path_clair3="/opt/models/clair3_models/${MODEL_C3}" \
  --model_path_clair3_trio="/opt/models/clair3_trio_models/${MODEL_C3T}" \
  --output=${OUTPUT_DIR}                          ## absolute output path prefix

Check Usage for more options.

About Q20+

The Q20+ model are trained on data obtained from ONT (basecalled three GIAB samples (HG002, 3, 4) with fast5 available using dorado with the sup mode).

We trained the model while holding out chromosome 20 in training stages and preserving it for testing.

We compared the new model trained gainst Clair3 v1.0, r1041_e82_400bps_sup_v400 and PEPPER r0.8, ont_r10_q20.

All testing output are available at here and analysis results are in this table.

Q20+ data used for training and testing

FASTQs and BAMs

download data from here

FAST5s

Sample	Reference	Aligner	Coverage	Basecaller	Chemistry
HG002	GRCh38_no_alt	minimap2	70.13	Dorado v4.0.0 SUP	R10.4.1 E8.2
HG003	GRCh38_no_alt	minimap2	79.66	Dorado v4.0.0 SUP	R10.4.1 E8.2
HG004	GRCh38_no_alt	minimap2	64.72	Dorado v4.0.0 SUP	R10.4.1 E8.2

We trained the model while holding out chromosome 20 in training stages and preserving it for testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

q20.md

q20.md

Q20+ data model

Contents

How to use the Q20+ model

About Q20+

Q20+ data used for training and testing

FASTQs and BAMs

FAST5s

Files

q20.md

Latest commit

History

q20.md

File metadata and controls

Q20+ data model

Contents

How to use the Q20+ model

About Q20+

Q20+ data used for training and testing

FASTQs and BAMs

FAST5s