Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revdep vardpoor example failures after melt change #6071

Closed
tdhock opened this issue Apr 11, 2024 · 13 comments · Fixed by #6333
Closed

revdep vardpoor example failures after melt change #6071

tdhock opened this issue Apr 11, 2024 · 13 comments · Fixed by #6333
Assignees
Labels
revdep Reverse dependencies
Milestone

Comments

@tdhock
Copy link
Member

tdhock commented Apr 11, 2024

after this commit 8fa8c8f
we get the following, which I will investigate.

* checking examples ... ERROR
Running examples in 'vardpoor-Ex.R' failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: vardannual
> ### Title: Variance estimation for measures of annual net change or annual
> ###   for single and multistage stage cluster sampling designs
> ### Aliases: vardannual
> ### Keywords: vardannual
> 
> ### ** Examples
> 
>  
> ### Example
> library("data.table")
> 
> set.seed(1)
> 
> data("eusilc", package = "laeken")
> eusilc1 <- eusilc[1:20, ]
> rm(eusilc)
> 
> dataset1 <- data.table(rbind(eusilc1, eusilc1),
+                        year = c(rep(2010, nrow(eusilc1)),
+                                 rep(2011, nrow(eusilc1))))
> rm(eusilc1)
> 
> dataset1[, country := "AT"]
> dataset1[, half := .I - 2 * trunc((.I - 1) / 2)]
> dataset1[, quarter := .I - 4 * trunc((.I - 1) / 4)]
> dataset1[age < 0, age := 0]
> 
> PSU <- dataset1[, .N, keyby = "db030"][, N := NULL][]
> PSU[, PSU := trunc(runif(.N, 0, 5))]
> 
> dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030")
> rm(PSU)
> 
> dataset1[, strata := "XXXX"]
> dataset1[, employed := trunc(runif(.N, 0, 2))]
> dataset1[, unemployed := trunc(runif(.N, 0, 2))]
> dataset1[, labour_force := employed + unemployed]
> dataset1[, id_lv2 := paste0("V", .I)]
> 
> vardannual(Y = "employed", H = "strata",
+            PSU = "PSU", w_final = "rb050",
+            ID_level1 = "db030", ID_level2 = "id_lv2",
+            Dom = NULL, Z = NULL, years = "year",
+            subperiods = "half", dataset = dataset1,
+            percentratio = 100, confidence = 0.95,
+            method = "cros")
Error in t(X) %*% A_matrix : non-conformable arguments
Calls: vardannual -> lapply -> FUN -> data.table
Execution halted
@tdhock tdhock self-assigned this Apr 11, 2024
@tdhock tdhock added the revdep Reverse dependencies label Apr 11, 2024
@tdhock
Copy link
Member Author

tdhock commented Apr 11, 2024

using data.table from CRAN, and adding print statements to vardpoor, I get the output below

vrdnnl> vardannual(Y = "employed", H = "strata",
vrdnnl+            PSU = "PSU", w_final = "rb050",
vrdnnl+            ID_level1 = "db030", ID_level2 = "id_lv2",
vrdnnl+            Dom = NULL, Z = NULL, years = "year",
vrdnnl+            subperiods = "half", dataset = dataset1,
vrdnnl+            percentratio = 100, confidence = 0.95,
vrdnnl+            method = "cros")
[1] "X"
 num [1:2] 242 1263
[1] "A_matrix"
 num [1:2, 1:2] 1 0.375 0.375 1
[1] "X"
 num [1:2] 1272 1542
[1] "A_matrix"
 num [1:2, 1:2] 1 -1 -1 1

Using data.table from github master I get output below

vrdnnl> vardannual(Y = "employed", H = "strata",
vrdnnl+            PSU = "PSU", w_final = "rb050",
vrdnnl+            ID_level1 = "db030", ID_level2 = "id_lv2",
vrdnnl+            Dom = NULL, Z = NULL, years = "year",
vrdnnl+            subperiods = "half", dataset = dataset1,
vrdnnl+            percentratio = 100, confidence = 0.95,
vrdnnl+            method = "cros")
[1] "X"
 num(0) 
[1] "A_matrix"
 num [1:2, 1:2] 1 NA NA 1
Erreur dans t(X) %*% A_matrix : arguments inadéquats
Appels : example ... eval -> eval -> vardannual -> lapply -> FUN -> data.table
Exécution arrêtée

@tdhock
Copy link
Member Author

tdhock commented Apr 11, 2024

vardannual calls vardcros which calls melt with measure via the code below

  if (!is.null(namesZ)) varsYZ <- list(namesY, namesZ)
  str(list(varsYZ=varsYZ, DTagg=DTagg))
  DTagg <- melt(DTagg, id = c(namesperc, gnamesDom),
                measure = varsYZ,
                variable.factor = FALSE)

output below using data.table master:

List of 2
 $ varsYZ:List of 1
  ..$ : chr "employed"
 $ DTagg :Classesdata.tableand 'data.frame':	4 obs. of  3 variables:
  ..$ pers    : chr [1:4] "2010__1" "2010__2" "2011__1" "2011__2"
  ..$ percoun : chr [1:4] "1" "1" "1" "1"
  ..$ employed: num [1:4] 2742 3479 2259 2622
  ..- attr(*, "sorted")= chr [1:2] "pers" "percoun"
  ..- attr(*, ".internal.selfref")=<externalptr> 

and same for data.table release, which suggests varsYZ (list of length 1) should be changed to character vector.

@tdhock
Copy link
Member Author

tdhock commented Apr 11, 2024

the fix is to add the following line in the revdep code, I will file a PR.

  if (length(varsYZ)==1) varsYZ <- unlist(varsYZ)

@jangorecki
Copy link
Member

From report you filled downstream it doesn't sound it was a bug in revdep code. Therefore why not to add, at least, one extra release cycle for transition, not only CRAN packages but all the others? Currently used approach was never used in data.table, Matt was pulling out breaking changes, and sometimes even deciding to not proceed with them at all. Giving extra release cycle of time, and a notice in NEWS file sounds as softer way to approach this change.

@tdhock
Copy link
Member Author

tdhock commented Apr 11, 2024

good idea

@MichaelChirico
Copy link
Member

I think in this case, because downstream is relying on undocumented behavior, we can be a bit more aggressive than usual: I would move to a warning in the next release, then error, then remove code supporting the old way.

@jangorecki
Copy link
Member

warning means that a package check will start to fail already from next release, and that sounds discouraging for new pkgs to import DT

@MichaelChirico
Copy link
Member

warning means that a package check will start to fail already from next release, and that sounds discouraging for new pkgs to import DT

warning in example is not registered by R CMD check (just confirmed)

@tdhock tdhock added this to the 1.16.0 milestone Jun 5, 2024
@MichaelChirico
Copy link
Member

@tdhock I filed a suggested improvement to your downstream PR. However, IMO it makes their code superficially worse by introducing inconsistency across the length==1/length>1 cases -- is there a strong reason not to support that style (e.g. doing so would require some contradictory/collided behavior & we prefer the new approach)?

If we plan to go ahead with breaking downstream, please make sure to add a relevant NEWS item explaining the change of behavior, and inform downstream of the imminent release; otherwise, please help to file a PR for the continued support of measure=list(<single vector>).

@tdhock
Copy link
Member Author

tdhock commented Jul 30, 2024

Right, the breaking change was made for increased consistency in data.table, see #5209
This is another example of bringing the implementation up to date with the docs, so not sure if a deprecation cycle is warranted. It definitely would be easier for us not to do one.
Since this is a weird corner case, and there is only one known revdep, I tend to err on the side of just making & explaining the breaking change. What do you think?
More generally, we should probably write a policy for when breaking changes are warranted (and when deprecation cycles are necessary). I believe the case of bringing the implementation up to date with the docs should be one example when a breaking change is warranted.

@MichaelChirico
Copy link
Member

I do agree it's not ideal to bend over backwards for one revdep, however

  • not all revdeps that are broken will be revealed by our testing -- many packages simply have poor test coverage!
  • user space (scripts, not packages) is even bigger and almost completely unknown/untested

So we tend to err on the side of caution wherever possible -- that's the "strong preference" for back-compatibility in the GOVERNANCE doc.

Adding consistency is good, but unless there's a new code path that prevents us from maintaining more back-compatibility in the short term, we should endeavor not to break downstreams without any deprecation cycle.

In this case, too, I worry from the example we did find that we are reducing consistency -- downstream goes from doing if (is.null(B)) list(A) else list(A, B) (parallel use of list() across branches) to if (is.null(B)) A else list(A, B); the former is more consistent to my reading. Is there an even nicer approach they should be using instead?

More generally, we should probably write a policy for when breaking changes are warranted (and when deprecation cycles are necessary).

I agree it would be useful to have this written out.

@tdhock
Copy link
Member Author

tdhock commented Aug 3, 2024

revdep checks confirm this is no longer an issue for vardpoor

@tdhock
Copy link
Member Author

tdhock commented Aug 3, 2024

I agree it would be useful to have this written out.

Here is a first attempt:
https://github.com/Rdatatable/data.table/wiki/Release-management-and-revdep-checks#when-is-a-breaking-change-warranted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
revdep Reverse dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants