LargeN usage #44

duccioa · 2023-08-22T15:27:06Z

Hello,
I am a bit confused by the usage and the documentation of the parameters largeN in classIntervals and I would be grateful for some guidance.
The documentation says

default 3000L, the QGIS sampling threshold; over 3000, the observations presented to "fisher" and "jenks" are either a samp_prop= sample or a sample of 3000, whichever is larger

In classIntervals(), largeN is used as following:

function (var, n, style = "quantile", rtimes = 3, ..., intervalClosure = c("left", 
    "right"), dataPrecision = NULL, warnSmallN = TRUE, warnLargeN = TRUE, 
    largeN = 3000L, samp_prop = 0.1, gr = c("[", "]")) {
       #.....

        nobs <- length(unique(var))
        
        #.....

        if (warnLargeN && (style %in% c("kmeans", "hclust", "bclust", 
            "fisher", "jenks"))) {
            if (nobs > largeN) {
                warning("N is large, and some styles will run very slowly; sampling imposed")
                sampling <- TRUE
                nsamp <- ifelse(samp_prop * nobs > 3000, as.integer(ceiling(samp_prop * 
                  nobs)), 3000L)
            }
        }
       #....
}

Where nobs <- length(unique(var)).

My understanding is that largeN is the threshold above which we consider var to require sampling.

What I find difficult to understand is that then largeN is not used to compute the sampling but we use the value 3000.
3000 is also the default of largeN, but the two values are not used in the same way. One is used as a threshold and the other one is hard coded to calculate the sample size.

This also gives a problem when length(var) < largeN:

library(classInt)

large_n = 1000
x = 1:(large_n + 1)
classInt::classIntervals(x, n = 10, style = "fisher", largeN = large_n, samp_prop = 0.05)
#> Warning in classInt::classIntervals(x, n = 10, style = "fisher", largeN = 1000,
#> : N is large, and some styles will run very slowly; sampling imposed
#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'

^{Created on 2023-08-22 by the reprex package (v2.0.1)}

Shouldn't it be something like nsamp <- min(largeN, nobs * samp_prop) ?

Thank you very much for your time.
Duccio

The text was updated successfully, but these errors were encountered:

rsbivand · 2023-08-22T15:33:09Z

@duccioa Yes, I think your analysis is correct. I'll try to prepare a fix in a branch during this week. May I ask you to review the changes when I'm ready?

duccioa · 2023-08-22T15:44:35Z

Roger (If I may call you by your name), I would be honored. I am a big fan of your work in the r-spatial community.

address #44

rsbivand · 2023-08-29T09:39:59Z

@duccioa Thanks very much! I've merged into the main branch now.

rsbivand · 2023-09-05T11:34:05Z

Submitted to CRAN.

- Take maintainership ChangeLog: Address LargeN usage: r-spatial/classInt#44

rsbivand added a commit that referenced this issue Aug 24, 2023

address #44

4bf693f

rsbivand added a commit that referenced this issue Aug 24, 2023

address #44

3652101

rsbivand added a commit that referenced this issue Aug 27, 2023

address #44

f7b2e03

rsbivand mentioned this issue Aug 27, 2023

Can nsamp be used with "kmeans", etc. #46

Closed

rsbivand added a commit that referenced this issue Aug 29, 2023

Merge pull request #45 from r-spatial/LargeN

ed72eef

address #44

rsbivand added a commit that referenced this issue Aug 29, 2023

address #44

d858db8

rsbivand closed this as completed Sep 5, 2023

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Oct 9, 2023

devel/R-cran-classInt: Update to 0.4-10

29471d1

- Take maintainership ChangeLog: Address LargeN usage: r-spatial/classInt#44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LargeN usage #44

LargeN usage #44

duccioa commented Aug 22, 2023

rsbivand commented Aug 22, 2023

duccioa commented Aug 22, 2023

rsbivand commented Aug 29, 2023

rsbivand commented Sep 5, 2023

LargeN usage #44

LargeN usage #44

Comments

duccioa commented Aug 22, 2023

rsbivand commented Aug 22, 2023

duccioa commented Aug 22, 2023

rsbivand commented Aug 29, 2023

rsbivand commented Sep 5, 2023