Fixing warning on Debian systems:
Result: WARN
Found the following significant warnings:
RcppExports.cpp:865:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
RcppExports.cpp:899:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
RcppExports.cpp:933:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
RcppExports.cpp:967:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/philentropy.Rcheck/00install.out’ for details.
* used C++ compiler: ‘Debian clang version 17.0.5 (1)’
- The solution was to implement this quick fix by reinstalling
Rcpp v1.0.11.6
viadevtools::install_github("https://github.com/RcppCore/Rcpp")
and rerunRcpp::compileAttributes()
.
- the Distances vignette now has a fixed documentation for the benchmarking of low-level distance functions. Many thanks to (@Nowosad) #30
- in
../src/correlation.h
adjustment of use of logical operators rather than Wbitwise (| -> or
) which otherwises raises warnings inclang14
- vector element limit is now extended to long vectors for all distance measures by declaring
R_xlen_t
instead ofint
during indexing.
distance()
and all other individual information theory functions receive a new argumentepsilon
with default valueepsilon = 0.00001
to treat cases where in individual distance or similarity computations yieldx / 0
or0 / 0
. Instead of a hard coded epsilon, users can now setepsilon
according to their input vectors. (Many thanks to Joshua McNeill #26 for this great question).- three new functions
dist_one_one()
,dist_one_many()
,dist_many_many()
are added. They are fairly flexible intermediaries betweendistance()
and single distance functions.dist_one_one()
expects two vectors (probability density functions) and returns a single value.dist_one_many()
expects one vector (a probability density function) and one matrix (a set of probability density functions), and returns a vector of values.dist_many_many()
expects two matrices (two sets of probability density functions), and returns a matrix of values. (Many thanks to Jakub Nowosad, see #27, #28, and New Vignette Many_Distance)
- a new Vignette Comparing many probability density functions (Many thanks to Jakub Nowosad)
dplyr
package dependency was removed and replaced by thepoorman
due to the heavy dependency burden ofdplyr
, sincephilentropy
only useddplyr::between()
which is nowpoorman::between()
(Many thanks to Patrice Kiener for this suggestion)distance(..., as.dist.obj = TRUE)
now returns the same values asstats::dist()
when working with 2 dimensional input matrices (2 vector inputs) (see #29) (Many thanks to Jakub Nowosad (@Nowosad)) Example:
library(philentropy)
m1 = matrix(c(1, 2), ncol = 1)
dist(m1)
#> 1
#> 2 1
distance(m1, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 2 vectors.
#> 1
#> 2 1
- the
distance()
function receives a new argumentmute.message
allowing users to mute message printing when running large-scale distance computations. Example:
distance(rbind(1:10/sum(1:10), 20:29/sum(20:29)),
method = "euclidean",
mute.message = TRUE)
- adding
markdown
dependency toDESCRIPTION
(find details here)
-
the
distance()
function receives a new argumentuse.row.names
to enable passing the row names from the input probability or count matrix to the output distance matrix -
the
distance()
function can now handledata.table
andtibble
input #16 -
adding new functionality and arguments
as.dist.obj
,diag
, andupper
tophilentropy::distance()
to allow users to retrieve astats::dist()
object when working withphilentropy::distance()
(Many thanks to Hugo Tavares #18 - see also #13) When usingphilentropy::distance(..., as.dist.obj = TRUE)
users can now directly pass thedistance()
output intohclust
:
Before:
ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
dist.mat <- distance(ProbMatrix, method = "jaccard")
true.dist.mat <- as.dist(dist.mat)
clust.res <- hclust(true.dist.mat, method = "complete")
clust.res
Call:
hclust(d = true.dist.mat, method = "complete")
Cluster method : complete
Number of objects: 3
Now:
ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
dist.mat <- distance(ProbMatrix, method = "jaccard", as.dist.obj = TRUE)
clust.res <- hclust(true.dist.mat, method = "complete")
clust.res
Call:
hclust(d = true.dist.mat, method = "complete")
Cluster method : complete
Number of objects: 3
- fixing a bug in
gJSD()
which tested transposed matrix rows rather than transposed matrix columns for sum > 1 (see issue #17 ; many thanks to @wkc1986)
- exporting all Rcpp distance measure functions individually (see issue #9), this enables access to much faster computations (see micro benchmarks at https://hajkd.github.io/philentropy/articles/Distances.html)
-
fixing bug which caused that KL distance returns NaN when P == 0 (see issue #10; Many thanks to @KaiserDominici)
-
fixing bug which caused stack overflow when computing distance matrices with many rows (see issue #7; Many thanks to @wkc1986 and @elbamos)
-
fixing bug in
gJSD()
where anrbind()
input matrix is not properly transposed (Many thanks to @vrodriguezf; see issue #14)
-
gJSD()
receives new argumentest.prob
to enable empirical estimation of probability vectors from input count vectors (non-probabilistic vectors) -
Jaccard and Tanimoto similarity measures now return
0
instead ofNAN
when probability vectors contain zeros (Many thanks to @JonasMandel; see issue #15)
- Fixing bug that caused
jensen-shannon
computations to compute wrong values when0 values
were present in the input vectors (see issue #4 ; Many thanks to @wkc1986) - Fixing bug that caused
jensen-difference
computations to compute wrong values when0 values
were present in the input vectors - Fixing bugs in all distance metrics when handing 0/0, 0/x or x/0 cases
- new message system
- extending documentation
- Fixing bug that caused that
JSD()
gives NaN when any probability is 0 - see #1 (Thanks to William Kurtis Chang)
- Fixing C++ memory leaks in
dist.diversity()
anddistance()
when check forcolSums(x) > 1.001
was peformed (leak was found withrhub::check_with_valgrind()
)
Initial submission version.