Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with using 'circ_quant' function (CLEAR with STAR Alignment) #23

Open
jennynuyirs opened this issue Jun 16, 2024 · 0 comments
Open

Comments

@jennynuyirs
Copy link

Hello! I am having some trouble getting the circ_quant function to work. My code is as follows:

circ_quant -c "$name/circRNA_out/circularRNA_known.txt" -b "$name/Aligned.sortedByCoord.out.bam" -r "$ref_genome.ref.txt" -o "$name.circRNA_quant.txt"

It produces the error AttributeError: ‘list’ object has no attribute ‘split’ (line 83 of circ_quant.py). It seems like the BAM file input is having trouble being split because the elements are not strings, but I'm skeptical this is actually the case because fixing it would require changing the source code (probably not a good idea).

I am fairly new to bioinformatics and only somewhat experienced with coding, so I'm unsure how to proceed from here. Any potential solutions or suggestions for debugging would be immensely helpful.

I've included the full pipeline below, which is a slightly modified version of @bounlu 's CLEAR with STAR Alignment pipeline. I've tested all the steps separately, which work as they should except the very last circ_quant step.

# define parameters
file_extension="_R1_001.fastq.gz"
read_length=100
ref_genome="hg38"

# make output directories
mkdir "STAR_$ref_genome"
mkdir "STAR_$ref_genome/$read_length"

# download reference files
fetch_ucsc.py "$ref_genome" fa "$ref_genome.fa"
fetch_ucsc.py "$ref_genome" ref "$ref_genome.ref.txt"
cut -f2-11 "$ref_genome.ref.txt" | genePredToGtf file stdin "$ref_genome.ref.gtf"

# generate genome index file
STAR --runMode genomeGenerate --genomeDir "STAR_$ref_genome/$read_length" --limitIObufferSize 1000000000 --runThreadN 16 --genomeFastaFiles "$ref_genome.fa" --outFileNamePrefix ./ --sjdbGTFfile "$ref_genome.ref.gtf" --sjdbOverhang "$(($read_length-1))"

# run pipeline
for read1 in $(ls *$file_extension);
do
        name="${read1%$file_extension}"
        read2="${name}_R2_001.fastq.gz"
        mkdir -p "$name"
        STAR --chimSegmentMin 20 --runThreadN 16 --genomeLoad LoadAndRemove --limitBAMsortRAM 50000000000 --limitIObufferSize 1000000000 --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --outFileNamePrefix "$name/" --genomeDir "STAR_$ref_genome/$read_length" --readFilesIn "$read1" "$read2" > "$name/$name.circRNA_alignment.log" 2>&1
        samtools index "$name/Aligned.sortedByCoord.out.bam"
        fast_circ.py parse -r "$ref_genome.ref.txt" -g "$ref_genome.fa" -t STAR -o "$name/circRNA_out" "$name/Chimeric.out.junction" > "$name/$name.circRNA_parse.log" 2>&1
        circ_quant -c "$name/circRNA_out/circularRNA_known.txt" -b "$name/Aligned.sortedByCoord.out.bam" -r "$ref_genome.ref.txt" -o "$name.circRNA_quant.txt" > "$name/$name.circRNA_quant.log" 2>&1
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant