Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dec.py: "local variable 'prop' referenced before assignment" #53

Open
derrickhlin opened this issue Aug 3, 2018 · 4 comments
Open

dec.py: "local variable 'prop' referenced before assignment" #53

derrickhlin opened this issue Aug 3, 2018 · 4 comments
Assignees

Comments

@derrickhlin
Copy link

Hello Dr. Zagordi,

I apologize if this is an incredibly basic question - I just started working with command line programs and NGS data this summer (since June 2018) and was hoping you might be able to help me in my exploration of analysis methods.

I currently have a data set containing 51 bp length Illumina reads of the NS5B gene (HCV). My reference is the h77 strain, and the full length of the genome in question is 1776 bp in length. I am working with Mac OSX. I am currently struggling with an error message of

local variable 'prop' referenced before assignment

As a summary of what I have done so far:

  1. Installed ShoRAH via miniconda3
    conda install -c biopython shorah

  2. Generated input files (fastq > sam > bam > _sorted.bam)
    #local alignment using bowtie2
    bowtie2-build **reference**.fasta h77
    bowtie2 --local -x h77 -U **rawdata**.fastq -S **samfile**.sam
    --
    running these 2 commands yields this output:
    3522968 reads; of these:
    3522968 (100.00%) were unpaired; of these:
    509358 (14.46%) aligned 0 times
    3013537 (85.54%) aligned exactly 1 time
    73 (0.00%) aligned >1 times
    85.54% overall alignment rate
    --
    [I have also tried running an end-to-end alignment]
    --
    #conversion to binary
    samtools view -b -T **reference**.fasta -o **bamfile**.bam **samfile**.sam
    --
    #sorting .bam file
    samtools sort **bamfile**.bam -o **bamfile_sorted**.bam
    --
    #generation of an indexing file
    samtools index **bamfile_sorted**.bam **bamfile_sorted**.bam.bai

  3. Used the reference.fasta file as a genome and loaded bamfile_sorted.bam into IGV
    which yields this mapping of the reads:
    image

  4. Checked coverage with Qualimap:
    image
    image
    image

  5. Attempted to run dec.py
    dec.py -b **bamfile_sorted**.bam -f **reference**.fasta -w 48 -r AF009606:200-1400
    --
    I selected a window of 48 because I had read in a previous issue post that the window should be slightly smaller than the read length.
    --
    I chose to run from 200-1400 to avoid any issues of coverage with the ends.

All of my attempts to run the local analysis yield something similar to the following in the terminal:

Traceback (most recent call last):
File "/Users/Derrick/miniconda3/bin/dec.py", line 78, in
args.region, args.max_coverage, args.alpha, args.keep_files, args.seed)
File "/Users/Derrick/miniconda3/lib/python3.6/site-packages/shorah_dec.py", line 449, in main
proposed[beg] = (get_prop(dbg_file), j)
File "/Users/Derrick/miniconda3/lib/python3.6/site-packages/shorah_dec.py", line 261, in get_prop
return prop
UnboundLocalError: local variable 'prop' referenced before assignment

in the dec.log file, the following lines are at the end of the log:

INFO 2018-08-03 15:39:41,522 main 442 reading windows for start position 152
WARNING 2018-08-03 15:39:44,129 correct_reads 234 No reads in window 152?
INFO 2018-08-03 15:39:44,129 main 446 this is window w-AF009606-152-199

[I have tried adjusting the window size, the positions, and the alpha value without success.]

Ultimately, I suppose my question is whether this is a user error or a systematic error (i.e. the data are not suitable for running this analysis). While I believe that the data have sufficient coverage (based on the Qualimap output), I am a little unclear on how to determine what is sufficient. I am also unsure if the data have sufficient diversity to run through dec.py - and admittedly I am unclear on how to determine this as well.

Again, I apologize if I am missing something basic (and for the formatting of this post) - I am trying my best to learn on my own, but I thought it would be easiest to just ask!

Thanks in advance for your help!

Best,
Derrick

@ozagordi
Copy link
Collaborator

ozagordi commented Aug 8, 2018

Hi Derrick. Thanks for your very detailed post.

For some reason, shorah doesn't pick up the region you specified. I'm not sure why this happens, first thing that crosses my mind is that the sequence header in the reference fasta file is not correct. It should read:

>AF009606
ACGTTTACAC...
rest of the sequence

Could it be that you have extra characters in the header? Something like AF009606.1? Your data should support discovery of variants. Let us know.

@derrickhlin
Copy link
Author

Hi Dr. Zagordi,

Thanks for your reply! Based on the conversation in issue #32, I had already changed the fasta header accordingly, so I don't believe this is the issue. Here is a screenshot from my reference file.

image

Additionally, I believe the .dgb and .smp files are named appropriately, as shown below.

image

Derrick

@sposadac
Copy link
Member

sposadac commented Aug 8, 2018

Hi Derrick

By default, overlapping windows are constructed taking reads that cover at least 80% of the window. You can have a look at the coverage.txt file and check how many reads make up each window (last column). I suspect that the window starting at position 152 doesn’t contain any read. If that’s the case, you can either make the window length smaller, or reduce the target region.
I should add that ShoRAH tries to cover the target region by at least two overlapping windows. That’s the reason why it starts before position 200.

@sposadac
Copy link
Member

sposadac commented Nov 20, 2018

This should solve the issue, while allowing to omit windows with low-coverage
#58

DrYak added a commit that referenced this issue Aug 27, 2019
 - appears when to low coverage inside a window
 - leading to no output of dpm_sampler
 - Fix: missing null check that crashed dpm_sampler
 - Fix: no samples that crashed shotgun.py
DrYak added a commit that referenced this issue Aug 29, 2019
 - appears when to low coverage inside a window
 - leading to no output of dpm_sampler
 - Fix: missing null check that crashed dpm_sampler
 - Fix: no samples tnd 'not found' hat crashed shotgun.py
@DrYak DrYak self-assigned this Sep 4, 2019
DrYak added a commit that referenced this issue Sep 9, 2019
 - appears when to low coverage inside a window
 - leading to no output of dpm_sampler
 - Fix: missing null check that crashed dpm_sampler
 - Fix: no samples tnd 'not found' hat crashed shotgun.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants