Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeBaseCalls Ab1 file #50

Open
dridk opened this issue Dec 19, 2017 · 6 comments
Open

makeBaseCalls Ab1 file #50

dridk opened this issue Dec 19, 2017 · 6 comments
Assignees
Milestone

Comments

@dridk
Copy link
Member

dridk commented Dec 19, 2017

Some AB1 file doesn't provide base calling from raw trace.
That's mean the following fields are empty and I cannot display the trace : PBAS, PLOC, PCON, DATA.9-14
We need to compute the base balls if they are missing.

@see AB1 specification
http://www6.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf

@see R implementation
https://www.bioconductor.org/packages/devel/bioc/vignettes/sangerseqR/inst/doc/sangerseq_walkthrough.pdf

@dridk dridk added this to the 0.3 milestone Dec 19, 2017
@dridk
Copy link
Member Author

dridk commented Dec 19, 2017

@dridk
Copy link
Member Author

dridk commented Dec 19, 2017

@dridk
Copy link
Member Author

dridk commented Dec 19, 2017

@dridk
Copy link
Member Author

dridk commented Dec 27, 2017

@dridk
Copy link
Member Author

dridk commented Dec 27, 2017

@dridk
Copy link
Member Author

dridk commented Dec 27, 2017

  1. Algorithms.

    Phred uses simple Fourier methods to examine the four base traces in
    the region surrounding each point in the data set in order to predict
    a series of evenly spaced predicted locations. That is, it determines
    where the peaks would be centered if there were no compressions,
    dropouts, or other factors shifting the peaks from their "true"
    locations.

    Next phred examines each trace to find the centers of the actual, or
    observed, peaks and the areas of these peaks relative to their neighbors.
    The peaks are detected independently along each of the four traces so
    many peaks overlap. A dynamic programming algorithm is used to match
    the observed peaks detected in the second step with the predicted peak
    locations found in the first step.

    Phred evaluates the trace surrounding each called base using four or
    five quality value parameters to quantify the trace quality. It
    uses a quality value lookup table to assign the corresponding quality
    value. The quality value is related to the base call error probability
    by the formula

    QV = - 10 * log_10( P_e )

    where P_e is the probability that the base call is an error.

    Phred uses data from a chemistry parameter file called 'phredpar.dat'
    in order to identify dye primer data. For dye primer data, phred
    identifies loop/stem sequence motifs that tend to result in
    CC and GG merged peak compressions. It reduces the quality values
    of potential merged peaks and splits those peaks that have certain
    trace characteristics indicative of merged CC and GG peaks. In
    addition, the chemistry and dye information are passed to phrap.

@natir natir self-assigned this Jan 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants