Skip to content

tdhock/binsegRcpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

binsegRcpp Efficient implementation of the binary segmentation heuristic algorithm for changepoint detection, using C++ std::multiset. Also contains functions for comparing empirical time complexity to best/worst case.

testshttps://github.com/tdhock/binsegRcpp/workflows/R-CMD-check/badge.svg
coveragehttps://codecov.io/gh/tdhock/binsegRcpp/branch/master/graph/badge.svg

Installation

install.packages("binsegRcpp")
##OR
if(require("remotes"))install.packages("remotes")
remotes::install_github("tdhock/binsegRcpp")

Usage

The main function is binseg for which you must at least specify the first two arguments:

  • distribution.str specifies the loss function to minimize.
  • data.vec is a numeric vector of data to segment.
 > x <- c(0.1, 0, 1, 1.1, 0.1, 0)
 > (models.dt <- binsegRcpp::binseg("mean_norm", x))
 binary segmentation model:
    segments   end          loss validation.loss
	<int> <int>         <num>           <num>
 1:        1     6  1.348333e+00               0
 2:        2     4  1.015000e+00               0
 3:        3     2  1.500000e-02               0
 4:        4     3  1.000000e-02               0
 5:        5     5  5.000000e-03               0
 6:        6     1 -3.339343e-16               0

The result above summarizes the data that are computed during the binary segmentation algorithm. It has a special class with dedicated methods:

> class(models.dt)
[1] "binsegRcpp" "list"      
> methods(class="binsegRcpp")
[1] coef  plot  print
see '?methods' for accessing help and source code

The coef methods returns a data table of segment means:

 > coef(models.dt, segments=2:3)
    segments start   end start.pos end.pos  mean
	<int> <int> <int>     <num>   <num> <num>
 1:        2     1     4       0.5     4.5  0.55
 2:        2     5     6       4.5     6.5  0.05
 3:        3     1     2       0.5     2.5  0.05
 4:        3     3     4       2.5     4.5  1.05
 5:        3     5     6       4.5     6.5  0.05

Demo of poisson loss and non-uniform weights:

> data.vec <- c(3,4,10,20)
> (fit1 <- binsegRcpp::binseg("poisson", data.vec, weight.vec=c(1,1,1,10)))
binary segmentation model:
   segments   end      loss validation.loss
      <int> <int>     <num>           <num>
1:        1     4 -393.8437               0
2:        2     3 -411.6347               0
3:        3     2 -413.9416               0
4:        4     1 -414.0133               0

Demo of change in mean and variance for normal distribution:

> sim <- function(mu,sigma)rnorm(10000,mu,sigma)
> set.seed(1)
> data.vec <- c(sim(5,1), sim(0, 5))
> fit <- binsegRcpp::binseg("meanvar_norm", data.vec)
> coef(fit, 2L)
   segments start   end start.pos end.pos        mean       var
      <int> <int> <int>     <num>   <num>       <num>     <num>
1:        2     1 10000       0.5 10000.5  4.99346296  1.024763
2:        2 10001 20000   10000.5 20000.5 -0.02095033 24.538556

Related work

Other implementations of binary segmentation include changepoint::cpt.mean(method=”BinSeg”) (quadratic storage in max number of segments), BinSeg::BinSegModel() (same linear storage as binsegRcpp), and ruptures.Binseg() (unknown storage). Figures comparing the timings.

This version uses the Rcpp/.Call interface whereas the binseg package uses the .C interface.

See branches for variations of the interface to use as test cases in RcppDeepState development.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages