diff --git a/CRAN-RELEASE b/CRAN-RELEASE index ad3d792..8f7efad 100644 --- a/CRAN-RELEASE +++ b/CRAN-RELEASE @@ -1,2 +1,2 @@ -This package was submitted to CRAN on 2021-04-20. -Once it is accepted, delete this file and tag the release (commit 1937a64). +This package was submitted to CRAN on 2021-05-03. +Once it is accepted, delete this file and tag the release (commit 8b698f7). diff --git a/README.md b/README.md index b9bb85e..4404540 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,7 @@ plotting functionality. ## Installation -You can install the developpement version of greed from +You can install the development version of greed from [GitHub](https://github.com/) with: ``` r @@ -59,7 +59,7 @@ install.packages("devtools") devtools::install_github("comeetie/greed") ``` -Or use the CRAN version +Or use the CRAN version: ``` r #CRAN @@ -79,17 +79,13 @@ library(greed) data(Jazz) sol=greed(Jazz) #> ------- undirected DCSBM model fitting ------ -#> ################# Generation 1: best solution with an ICL of -28617 and 15 clusters ################# -#> ################# Generation 2: best solution with an ICL of -28594 and 16 clusters ################# -#> ################# Generation 3: best solution with an ICL of -28581 and 13 clusters ################# -#> ################# Generation 4: best solution with an ICL of -28576 and 15 clusters ################# -#> ################# Generation 5: best solution with an ICL of -28576 and 13 clusters ################# -#> ################# Generation 6: best solution with an ICL of -28568 and 13 clusters ################# -#> ################# Generation 7: best solution with an ICL of -28561 and 13 clusters ################# -#> ################# Generation 8: best solution with an ICL of -28561 and 13 clusters ################# -#> ################# Generation 9: best solution with an ICL of -28561 and 13 clusters ################# +#> ################# Generation 1: best solution with an ICL of -28611 and 16 clusters ################# +#> ################# Generation 2: best solution with an ICL of -28601 and 15 clusters ################# +#> ################# Generation 3: best solution with an ICL of -28580 and 16 clusters ################# +#> ################# Generation 4: best solution with an ICL of -28578 and 15 clusters ################# +#> ################# Generation 5: best solution with an ICL of -28578 and 15 clusters ################# #> ------- Final clustering ------- -#> ICL clustering with a DCSBM model, 12 clusters and an icl of -28556. +#> ICL clustering with a DCSBM model, 14 clusters and an icl of -28559. ``` Here Jazz is a square sparse matrix and a `` ?`dcsbm-class` `` model @@ -130,14 +126,16 @@ plan(multisession) data("Blogs") sol=greed(Blogs$X) #> ------- directed DCSBM model fitting ------ -#> ################# Generation 1: best solution with an ICL of -84548 and 16 clusters ################# -#> ################# Generation 2: best solution with an ICL of -84267 and 20 clusters ################# -#> ################# Generation 3: best solution with an ICL of -84260 and 20 clusters ################# -#> ################# Generation 4: best solution with an ICL of -84225 and 18 clusters ################# -#> ################# Generation 5: best solution with an ICL of -84212 and 18 clusters ################# -#> ################# Generation 6: best solution with an ICL of -84212 and 18 clusters ################# +#> ################# Generation 1: best solution with an ICL of -84417 and 16 clusters ################# +#> ################# Generation 2: best solution with an ICL of -84358 and 17 clusters ################# +#> ################# Generation 3: best solution with an ICL of -84199 and 18 clusters ################# +#> ################# Generation 4: best solution with an ICL of -84179 and 19 clusters ################# +#> ################# Generation 5: best solution with an ICL of -84160 and 18 clusters ################# +#> ################# Generation 6: best solution with an ICL of -84150 and 17 clusters ################# +#> ################# Generation 7: best solution with an ICL of -84143 and 17 clusters ################# +#> ################# Generation 8: best solution with an ICL of -84143 and 17 clusters ################# #> ------- Final clustering ------- -#> ICL clustering with a DCSBM model, 17 clusters and an icl of -84177. +#> ICL clustering with a DCSBM model, 16 clusters and an icl of -84101. plot(sol) ``` diff --git a/docs/articles/GMM.html b/docs/articles/GMM.html index 6a7116f..e4f461e 100644 --- a/docs/articles/GMM.html +++ b/docs/articles/GMM.html @@ -102,7 +102,7 @@

GMM

-

Loads packages and set a future plan for parallel processing if you want.

+

Loads packages.

 library(greed)
 library(mclust)
@@ -198,7 +198,7 @@ 

#> Normal 73 3 0 0 #> Overt 0 7 13 13

-

You may still look at coarser clustering and inspect the clustering dendogram:

+

You may still look at coarser clustering and inspect the clustering dendrogram:

 plot(soldiag,type='tree')
 solK3 = cut(soldiag,3)
@@ -218,7 +218,7 @@ 

#> [1] -2430.727 soldiag@icl #> [1] -2465.102

-

The full model seems preferable on this dataset. If you want to look at the mixture component parameters you may acces their Maximum a Posteriori estimate with the generic coef function.

+

The full model seems preferable on this dataset. If you want to look at the mixture component parameters you may access their Maximum a Posteriori estimate with the generic coef function.

 params = coef(solK3)
 params$Sigmak[[2]]
@@ -226,7 +226,7 @@ 

#> [1,] 451.5805 0.00 0 #> [2,] 0.0000 13879.21 0 #> [3,] 0.0000 0.00 25363

-

Such simpler diagonal model may be of interest in particular for high dimensional settings for two reasons. First, the number of parameters (even if they are integrated out in the clustering phase) is reduced and this can be interesting when \(d\) is important, but also because the prior maybe defined such that it will be less informative. You may try with a subset of the fashion mnist data provided with the package which contains 784 dimensionals vectors (28x28 flattened images). In such settings, you may also want to switch the optimization algorithm to ?`seed-class`, this algorithm is less efficient than the hybrid algorithm used by default by greed. But, since it relied on a seeded initialization it is also a little bit less costly. In this case, you may increase the initial value for \(K\) since, this algorithm is not able to find an clustering with a number of cluster bigger than the value of \(K\) provided by the user. Still, it may simplify the clustering and return an optimal clustering with less clusters.

+

Such simpler diagonal model may be of interest in particular for high dimensional settings for two reasons. First, the number of parameters (even if they are integrated out in the clustering phase) is reduced and this can be interesting when \(d\) is important, but also because the prior maybe defined such that it will be less informative. You may try with a subset of the fashion mnist data provided with the package which contains 784 dimensional vectors (28x28 flattened images). In such settings, you may also want to switch the optimization algorithm to ?`seed-class`, this algorithm is less efficient than the hybrid algorithm used by default by greed. But, since it relied on a seeded initialization it is also a little bit less costly. In this case, you may increase the initial value for \(K\) since, this algorithm is not able to find an clustering with a number of cluster bigger than the value of \(K\) provided by the user. Still, it may simplify the clustering and return an optimal clustering with less clusters.

 data("fashion")
 dim(fashion$X)
@@ -235,7 +235,7 @@ 

#> ------- DIAGGMM model fitting ------ #> ------- Final clustering ------- #> ICL clustering with a DIAGGMM model, 22 clusters and an icl of -3587798.

-

On this more complex dataset, we may look at the dendogram which is more interesting with the complex structure of these data.

+

On this more complex dataset, we may look at the dendrogram which is more interesting with the complex structure of these data.

 plot(sol,type='tree')

diff --git a/docs/articles/graph-clustering-with-sbm.html b/docs/articles/graph-clustering-with-sbm.html index c78fb30..696988b 100644 --- a/docs/articles/graph-clustering-with-sbm.html +++ b/docs/articles/graph-clustering-with-sbm.html @@ -95,7 +95,7 @@

Graph clustering with SBM

Etienne Côme

-

2021-05-02

+

2021-05-03

Source: vignettes/graph-clustering-with-sbm.Rmd @@ -124,11 +124,10 @@

2021-05-02

 sol = greed(sbm$x,model=new("sbm"))
 #> ------- directed SBM model fitting ------
-#> ################# Generation  1: best solution with an ICL of -14015 and 7 clusters #################
-#> ################# Generation  2: best solution with an ICL of -13969 and 6 clusters #################
-#> ################# Generation  3: best solution with an ICL of -13969 and 6 clusters #################
+#> ################# Generation  1: best solution with an ICL of -14312 and 6 clusters #################
+#> ################# Generation  2: best solution with an ICL of -14312 and 6 clusters #################
 #> ------- Final clustering -------
-#> ICL clustering with a SBM model, 6 clusters and an icl of -13969.
+#> ICL clustering with a SBM model, 6 clusters and an icl of -14312.

Plot the results using a block representation.

 plot(sol,type='blocks')
diff --git a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-4-1.png index 120164e..1298831 100644 Binary files a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-4-1.png and b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-5-1.png index 1bb6fa8..b348b87 100644 Binary files a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-5-1.png and b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-6-1.png b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-6-1.png index 8033114..5a343cb 100644 Binary files a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-6-1.png and b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-7-1.png b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-7-1.png index 0e1f9d2..5ca05ee 100644 Binary files a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-7-1.png and b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-8-1.png b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-8-1.png index 4fba60f..14f017b 100644 Binary files a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-8-1.png and b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-8-1.png differ diff --git a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-9-1.png b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-9-1.png index dbcc2dc..fc52297 100644 Binary files a/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-9-1.png and b/docs/articles/graph-clustering-with-sbm_files/figure-html/unnamed-chunk-9-1.png differ diff --git a/docs/index.html b/docs/index.html index e049139..9b96927 100644 --- a/docs/index.html +++ b/docs/index.html @@ -112,12 +112,12 @@

Installation

-

You can install the developpement version of greed from GitHub with:

+

You can install the development version of greed from GitHub with:

 #GitHub
 install.packages("devtools")
 devtools::install_github("comeetie/greed")
-

Or use the CRAN version

+

Or use the CRAN version:

 #CRAN
 install.packages("greed")
@@ -131,17 +131,13 @@

data(Jazz) sol=greed(Jazz) #> ------- undirected DCSBM model fitting ------ -#> ################# Generation 1: best solution with an ICL of -28617 and 15 clusters ################# -#> ################# Generation 2: best solution with an ICL of -28594 and 16 clusters ################# -#> ################# Generation 3: best solution with an ICL of -28581 and 13 clusters ################# -#> ################# Generation 4: best solution with an ICL of -28576 and 15 clusters ################# -#> ################# Generation 5: best solution with an ICL of -28576 and 13 clusters ################# -#> ################# Generation 6: best solution with an ICL of -28568 and 13 clusters ################# -#> ################# Generation 7: best solution with an ICL of -28561 and 13 clusters ################# -#> ################# Generation 8: best solution with an ICL of -28561 and 13 clusters ################# -#> ################# Generation 9: best solution with an ICL of -28561 and 13 clusters ################# +#> ################# Generation 1: best solution with an ICL of -28611 and 16 clusters ################# +#> ################# Generation 2: best solution with an ICL of -28601 and 15 clusters ################# +#> ################# Generation 3: best solution with an ICL of -28580 and 16 clusters ################# +#> ################# Generation 4: best solution with an ICL of -28578 and 15 clusters ################# +#> ################# Generation 5: best solution with an ICL of -28578 and 15 clusters ################# #> ------- Final clustering ------- -#> ICL clustering with a DCSBM model, 12 clusters and an icl of -28556.

+#> ICL clustering with a DCSBM model, 14 clusters and an icl of -28559.

Here Jazz is a square sparse matrix and a ?`dcsbm-class` model will be used by default. Some plotting function enable the exploration of the clustering results:

 plot(sol)
@@ -161,14 +157,16 @@

data("Blogs") sol=greed(Blogs$X) #> ------- directed DCSBM model fitting ------ -#> ################# Generation 1: best solution with an ICL of -84548 and 16 clusters ################# -#> ################# Generation 2: best solution with an ICL of -84267 and 20 clusters ################# -#> ################# Generation 3: best solution with an ICL of -84260 and 20 clusters ################# -#> ################# Generation 4: best solution with an ICL of -84225 and 18 clusters ################# -#> ################# Generation 5: best solution with an ICL of -84212 and 18 clusters ################# -#> ################# Generation 6: best solution with an ICL of -84212 and 18 clusters ################# +#> ################# Generation 1: best solution with an ICL of -84417 and 16 clusters ################# +#> ################# Generation 2: best solution with an ICL of -84358 and 17 clusters ################# +#> ################# Generation 3: best solution with an ICL of -84199 and 18 clusters ################# +#> ################# Generation 4: best solution with an ICL of -84179 and 19 clusters ################# +#> ################# Generation 5: best solution with an ICL of -84160 and 18 clusters ################# +#> ################# Generation 6: best solution with an ICL of -84150 and 17 clusters ################# +#> ################# Generation 7: best solution with an ICL of -84143 and 17 clusters ################# +#> ################# Generation 8: best solution with an ICL of -84143 and 17 clusters ################# #> ------- Final clustering ------- -#> ICL clustering with a DCSBM model, 17 clusters and an icl of -84177. +#> ICL clustering with a DCSBM model, 16 clusters and an icl of -84101. plot(sol)

diff --git a/docs/news/index.html b/docs/news/index.html index add0a88..667ec26 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -146,7 +146,7 @@

  • Better input checking for mvmreg and gmm
  • Better input checking for greed_cond
  • Correction of compilation problems on solaris
  • -
  • Correction of pointer problem comming from shed_row/shed_col
  • +
  • Correction of pointer problem coming from shed_row/shed_col
  • Added a NEWS.md file to track changes to the package.
  • diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 530064c..8254635 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -4,5 +4,5 @@ pkgdown_sha: ~ articles: GMM: GMM.html graph-clustering-with-sbm: graph-clustering-with-sbm.html -last_built: 2021-05-02T13:37Z +last_built: 2021-05-03T12:47Z diff --git a/docs/reference/figures/cut-1.png b/docs/reference/figures/cut-1.png index 132ec45..862b42e 100644 Binary files a/docs/reference/figures/cut-1.png and b/docs/reference/figures/cut-1.png differ diff --git a/docs/reference/figures/future-1.png b/docs/reference/figures/future-1.png index c96e451..6992717 100644 Binary files a/docs/reference/figures/future-1.png and b/docs/reference/figures/future-1.png differ diff --git a/docs/reference/figures/plot-1.png b/docs/reference/figures/plot-1.png index eefea35..c5320d5 100644 Binary files a/docs/reference/figures/plot-1.png and b/docs/reference/figures/plot-1.png differ diff --git a/docs/reference/figures/tree-1.png b/docs/reference/figures/tree-1.png index d1bef1a..0034c50 100644 Binary files a/docs/reference/figures/tree-1.png and b/docs/reference/figures/tree-1.png differ diff --git a/docs/reference/gmm-class.html b/docs/reference/gmm-class.html index 5ebd703..4392539 100644 --- a/docs/reference/gmm-class.html +++ b/docs/reference/gmm-class.html @@ -44,10 +44,10 @@ The model corresponds to the following generative model: $$ \pi \sim Dirichlet(\alpha)$$ $$ Z_i \sim \mathcal{M}(1,\pi)$$ -$$ V_k \sim \mathcal{W}(\epsilon^{-1},n_0)$$ +$$ V_k \sim \mathcal{W}(\varepsilon^{-1},n_0)$$ $$ \mu_k \sim \mathcal{N}(\mu,(\tau V_k)^{-1})$$ -$$ X_{i.}|Z_{ik}=1 \sim \mathcal{N}(\mu_k,V_{k}^{-1})$$ -with \(\mathcal{W}(\epsilon^{-1},n_0)\) the Whishart distribution." /> +$$ X_{i}|Z_{ik}=1 \sim \mathcal{N}(\mu_k,V_{k}^{-1})$$ +with \(\mathcal{W}(\varepsilon^{-1},n_0)\) the Whishart distribution." /> @@ -149,10 +149,10 @@

    Gaussian mixture model description class

    The model corresponds to the following generative model: $$ \pi \sim Dirichlet(\alpha)$$ $$ Z_i \sim \mathcal{M}(1,\pi)$$ -$$ V_k \sim \mathcal{W}(\epsilon^{-1},n_0)$$ +$$ V_k \sim \mathcal{W}(\varepsilon^{-1},n_0)$$ $$ \mu_k \sim \mathcal{N}(\mu,(\tau V_k)^{-1})$$ -$$ X_{i.}|Z_{ik}=1 \sim \mathcal{N}(\mu_k,V_{k}^{-1})$$ -with \(\mathcal{W}(\epsilon^{-1},n_0)\) the Whishart distribution.

    +$$ X_{i}|Z_{ik}=1 \sim \mathcal{N}(\mu_k,V_{k}^{-1})$$ +with \(\mathcal{W}(\varepsilon^{-1},n_0)\) the Whishart distribution.

    diff --git a/docs/reference/mvmreg-class.html b/docs/reference/mvmreg-class.html index 7e22e6c..7c18bc8 100644 --- a/docs/reference/mvmreg-class.html +++ b/docs/reference/mvmreg-class.html @@ -45,7 +45,7 @@ The model corresponds to the following generative model: $$ \pi \sim Dirichlet(\alpha)$$ $$ Z_i \sim \mathcal{M}(1,\pi)$$ -$$ V_k \sim \mathcal{W}(\epsilon^{-1},n_0)$$ +$$ V_k \sim \mathcal{W}(\varepsilon^{-1},n_0)$$ $$ A_k \sim \mathcal{MN}(0,(V_k)^{-1},\tau X^{t}X)$$ $$ Y_{i.}|X_{i.}Z_{ik}=1 \sim \mathcal{N}(A_kx_{i.},V_{k}^{-1})$$ with \(\mathcal{W}(\epsilon^{-1},n_0)\) the Whishart distribution and \(\mathcal{MN}\) the matrix-normal distribution." /> @@ -151,7 +151,7 @@

    Multivariate mixture of regression model description class

    The model corresponds to the following generative model: $$ \pi \sim Dirichlet(\alpha)$$ $$ Z_i \sim \mathcal{M}(1,\pi)$$ -$$ V_k \sim \mathcal{W}(\epsilon^{-1},n_0)$$ +$$ V_k \sim \mathcal{W}(\varepsilon^{-1},n_0)$$ $$ A_k \sim \mathcal{MN}(0,(V_k)^{-1},\tau X^{t}X)$$ $$ Y_{i.}|X_{i.}Z_{ik}=1 \sim \mathcal{N}(A_kx_{i.},V_{k}^{-1})$$ with \(\mathcal{W}(\epsilon^{-1},n_0)\) the Whishart distribution and \(\mathcal{MN}\) the matrix-normal distribution.

    diff --git a/man/figures/cut-1.png b/man/figures/cut-1.png index 132ec45..862b42e 100644 Binary files a/man/figures/cut-1.png and b/man/figures/cut-1.png differ diff --git a/man/figures/future-1.png b/man/figures/future-1.png index c96e451..6992717 100644 Binary files a/man/figures/future-1.png and b/man/figures/future-1.png differ diff --git a/man/figures/plot-1.png b/man/figures/plot-1.png index eefea35..c5320d5 100644 Binary files a/man/figures/plot-1.png and b/man/figures/plot-1.png differ diff --git a/man/figures/tree-1.png b/man/figures/tree-1.png index d1bef1a..0034c50 100644 Binary files a/man/figures/tree-1.png and b/man/figures/tree-1.png differ