-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rename threshold_genoprob() as clean_genoprob() + clean columns
- "clean" seems potentially more informative than "threshold" - In addition to setting small values to 0, we also look at the maximum probability in a genotype column; if that's not large we set all values in that column to 0. - Related to Issue rqtl#34.
- Loading branch information
Showing
12 changed files
with
246 additions
and
104 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
#' Clean genotype probabilities | ||
#' | ||
#' Clean up genotype probabilities by setting small values to 0 and | ||
#' for a genotype column where the maximum value is rather small, set | ||
#' all values in that column to 0. | ||
#' | ||
#' @md | ||
#' | ||
#' @param probs Genotype probabilities as calculated by | ||
#' [calc_genoprob()]. | ||
#' @param value_threshold Probabilities below this value will be set to 0. | ||
#' @param column_threshold For genotype columns where the maximum | ||
#' value is below this threshold, all values will be set to 0. | ||
#' This must be less than \eqn{1/k} where \eqn{k} is the number of genotypes. | ||
#' @param cores Number of CPU cores to use, for parallel calculations. | ||
#' (If `0`, use [parallel::detectCores()].) | ||
#' Alternatively, this can be links to a set of cluster sockets, as | ||
#' produced by [parallel::makeCluster()]. | ||
#' | ||
#' @return A cleaned version of the input genotype probabilities object, `probs`. | ||
#' | ||
#' @details | ||
#' In cases where a particular genotype is largely absent, | ||
#' `scan1coef()` and `fit1()` can give unstable estimates of the | ||
#' genotype effects. Cleaning up the genotype probabilities by setting | ||
#' small values to 0 helps to ensure that such effects get set to | ||
#' `NA`. | ||
#' | ||
#' At each position and for each genotype column, we find the maximum | ||
#' probability across individuals. If that maximum is < | ||
#' `column_threshold`, all values in that genotype column at that | ||
#' position are set to 0. | ||
#' | ||
#' In addition, any genotype probabilties that are < `value_threshold` | ||
#' (generally < `column_threshold`) are set to 0. | ||
#' | ||
#' The probabilities are then re-scaled so that the probabilities for | ||
#' each individual at each position sum to 1. | ||
#' | ||
#' @examples | ||
#' iron <- read_cross2(system.file("extdata", "iron.zip", package="qtl2")) | ||
#' \dontshow{iron <- iron[,c("19", "X")] # subset to chr 19 and X} | ||
#' | ||
#' # calculate genotype probabilities | ||
#' probs <- calc_genoprob(iron, map, error_prob=0.002) | ||
#' | ||
#' # clean the genotype probabilities and paste over original values | ||
#' probs <- clean_genoprob(probs) | ||
#' | ||
#' @export | ||
clean_genoprob <- | ||
function(probs, value_threshold=1e-6, column_threshold=0.01, cores=1) | ||
{ | ||
attrib <- attributes(probs) | ||
|
||
cores <- setup_cluster(cores) | ||
|
||
result <- cluster_lapply(cores, seq_along(probs), | ||
function(i) { | ||
this_result <- .clean_genoprob(probs[[i]], value_threshold, column_threshold) | ||
dimnames(this_result) <- dimnames(probs[[i]]) | ||
this_result }) | ||
|
||
for(a in names(attrib)) { | ||
attr(result, a) <- attrib[[a]] | ||
} | ||
|
||
result | ||
} |
This file was deleted.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
// clean genoprobs, setting small values to 0 | ||
|
||
#include "clean_genoprob.h" | ||
#include <math.h> | ||
#include <Rcpp.h> | ||
|
||
using namespace Rcpp; | ||
|
||
// clean genoprobs, setting small values to 0 | ||
// [[Rcpp::export(".clean_genoprob")]] | ||
NumericVector clean_genoprob(const NumericVector& prob_array, // array as n_ind x n_gen x n_pos | ||
double value_threshold=1e-6, | ||
double column_threshold=0.01) | ||
{ | ||
if(Rf_isNull(prob_array.attr("dim"))) | ||
throw std::invalid_argument("prob_array should be a 3d array but has no dimension attribute"); | ||
const IntegerVector& dim = prob_array.attr("dim"); | ||
if(dim.size() != 3) | ||
throw std::invalid_argument("prob_array should be a 3d array of probabilities"); | ||
const int n_ind = dim[0]; | ||
const int n_gen = dim[1]; | ||
const int n_pos = dim[2]; | ||
|
||
NumericVector result = clone(prob_array); | ||
|
||
// ensure that we don't set all values in a row to 0 | ||
if(column_threshold > 1.0/(double)n_gen) | ||
column_threshold = 0.5/(double)n_gen; | ||
if(value_threshold > 1.0/(double)n_gen) | ||
value_threshold = 0.5/(double)n_gen; | ||
|
||
for(int pos=0, offset=0; pos<n_pos; pos++) { | ||
|
||
// first look at each genotype column and find max; if < column_threshold, set all values to 0 | ||
for(int gen=0; gen<n_gen; gen++) { | ||
bool zero_column = true; | ||
for(int ind=0; ind<n_ind; ind++) { | ||
if(prob_array[ind + gen*n_ind + pos*n_gen*n_ind] >= column_threshold) { | ||
zero_column = false; | ||
break; | ||
} | ||
} | ||
if(zero_column) { // biggest value was < column_threshold so zero the column | ||
for(int ind=0; ind<n_ind; ind++) { | ||
result[ind + gen*n_ind + pos*n_gen*n_ind] = 0.0; | ||
} | ||
} | ||
} | ||
|
||
// now look at the individual values | ||
|
||
for(int ind=0; ind<n_ind; ind++) { | ||
double sum=0.0; | ||
for(int gen=0; gen<n_gen; gen++) { | ||
// small values set to 0 | ||
int index = ind + gen*n_ind + pos*n_gen*n_ind; | ||
if(result[index] < value_threshold) result[offset+gen] = 0.0; | ||
|
||
// get sum so we can rescale to sum to 1 | ||
sum += result[offset+gen]; | ||
} | ||
|
||
for(int gen=0; gen<n_gen; gen++) { | ||
int index = ind + gen*n_ind + pos*n_gen*n_ind; | ||
result[offset+gen] /= sum; | ||
} | ||
|
||
} | ||
} | ||
|
||
result.attr("dim") = Dimension(n_ind, n_gen, n_pos); | ||
|
||
return result; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
// clean genoprobs, setting small values to 0 | ||
#ifndef CLEAN_GENOPROB_H | ||
#define CLEAN_GENOPROB_H | ||
|
||
#include <Rcpp.h> | ||
|
||
// clean genoprobs, setting small values to 0 | ||
Rcpp::NumericVector clean_genoprob(const Rcpp::NumericVector& prob_array, // array as n_ind x n_gen x n_pos | ||
double value_threshold, | ||
double column_threshold); | ||
|
||
#endif // CLEAN_GENOPROB_H |
Oops, something went wrong.