Skip to content

tdhock/binseg-model-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Experiments to support upcoming research papers about binary segmentation.

Paper: Finite sample complexity analysis of binary segmentation

To reproduce the analyses and figures in the paper you can use the code below:

TODOs

7 Aug 2022

figure-compare-ruptures.R makes the figure below which plots asymptotic reference lines on top of the empirical median timing,

figure-compare-ruptures.png

The figure above shows that

  • first column: changepoint is clearly cubic,
  • second/third columns: ruptures is clearly quadratic in worst case, and best case seems to be somewhere between quadratic and log-linear.

12 July 2022

figure-timings-versions-data.R makes figure-timings-versions-data.csv

figure-timings-versions.R reads that and makes

figure-timings-versions.png

17 May 2022

figure-synthetic.R makes figure-synthetic.png

4 May 2022

figure-neuroblastoma-iterations-data.R makes figure-neuroblastoma-iterations-data.csv and figure-neuroblastoma-iterations-bounds.csv

figure-neuroblastoma-iterations.R makes

figure-neuroblastoma-iterations.png

21 Apr 2022

figure-mcgill-iterations-data.R makes figure-mcgill-iterations-data.csv

figure-mcgill-iterations.R reads that and makes

figure-mcgill-iterations.png

Figure above shows that in real data achieves asymptotic best case complexity.

13 Apr 2022

figure-best-heuristics.R makes figures below

figure-best-heuristics.png

figure-best-heuristics-segs-constant.png

figure-optimal-trees.R makes small pictures for slides like this

figure-optimal-trees-71.png

and big picture below

figure-optimal-trees.png

11 Apr 2022

figure-compare-distributions.R makes

figure-compare-distributions.png

Figure above shows small asymptotic slowdown for Laplace median distribution.

figure-splits-loss.R makes new figure

figure-splits-loss-cum.png

which shows some empirical cumulative counts smaller than the best, which is possible since the “best” case time complexity bound refers to the total number of splits at the end, not at each iteration.

6 Apr 2022

figure-splits-loss.R makes

figure-splits-loss.png

Figure above shows that 0/1 seq can be used for worst case of l1,mean_norm,poisson loss, but for meanvar_norm we need 0/1/10/11 seq. Also linear data achieves best case for normal losses, and is very close to best case for l1 and poisson. TODO figure out a simple synthetic data sequence which achieves the best case for l1 and poisson.

figure-timings-laplace-data.R makes figure-timings-meanvar_norm-data.csv

figure-timings-laplace.R reads that file and makes

figure-timings-laplace.png

Figure above shows that ruptures looks asymptotically slower in best case.

figure-timings-meanvar_norm-data.R makes figure-timings-meanvar_norm-data.csv

figure-timings-meanvar_norm.R reads that file and makes

figure-timings-meanvar_norm.png

Figure above shows that

  • blockcpd is about the same as binsegRcpp multiset.
  • for worst case changepoint is faster up to very large model sizes, but asymptotically slower.

figure-timings-poisson-data.R makes figure-timings-poisson-data.csv

figure-timings-poisson.R reads that file and makes

figure-timings-poisson.png

Figure above shows that

  • blockcpd about the same as binsegRcpp multiset.
  • others consistent with other losses.

TODO compare both versions of blockcpd. Also compare with max.segs=n.data since that is what blockcpd does?

24 Mar 2022

figure-neuroblastoma.R makes the figure below, which shows a real data set for which there are differences between binsegRcpp and ruptures/changepoint.

figure-neuroblastoma.png

23 Mar 2022

ruptures_bug.py and changepoint.bug.R used to report issues, deepcharles/ruptures#242 and rkillick/changepoint#69

22 Mar 2022

figure-timings-data.R makes figure-timings-data.csv

figure-timings.R reads that and makes

figure-timings.png

Figure above was created using synthetic data which achieve the best/worst case time complexity of the binary segmentation algorithm. For each data set of a given size N in {2^2=4,8,16,32,…,2^20=1,048,576}, we run binary segmentation up to a max of N/2 segments (and not going to a larger N if the algo/case resulted in a time greater than 100 seconds). The timings suggest that changepoint R package uses a cubic algorithm (three nested for loops) whereas binsegRcpp uses an algorithm which is log-linear in the best case, and quadratic in the worst case. The ruptures python module seems to be asymptotically faster than changepoint but slower than binsegRcpp, maybe quadratic?

figure-timings-loss.png

Figure above shows that loss for binsegRcpp is always less than loss for others, suggesting that there are bugs in the other implementations.

20 Jan 2022

figure-select-segments-data.R computes simulations using a variety of model selection criteria, saving results to figure-select-segments-data.csv

figure-select-segments.R reads that result CSV file and makes

figure-select-segments.png

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published