Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.frame creates invalid interlaced vectors #13

Closed
khusmann opened this issue Sep 30, 2024 · 3 comments · Fixed by #15
Closed

model.frame creates invalid interlaced vectors #13

khusmann opened this issue Sep 30, 2024 · 3 comments · Fixed by #15

Comments

@khusmann
Copy link
Owner

khusmann commented Sep 30, 2024

library(tidyverse)
library(interlacer)

df <- tibble(
  a = c(1,2,3),
  b = interlaced(c(1,-99,3), na=-99),
)

invalid <- model.frame(a ~ b, df)

value_channel(invalid$b)
#> [1] 1 3

# Oops, the na channel hasn't been updated
na_channel(invalid$b)
#> [1]  NA -99  NA

# This will error:
# print(invalid$b)

Created on 2024-09-30 with reprex v2.0.2

@khusmann
Copy link
Owner Author

This is because stats::model.frame.default() calls a C function to do the na.action: https://github.com/wch/r-source/blob/b8b2662a848a8eb87d67d66d737b00b6b2b127f2/src/library/stats/R/models.R#L565

This C function drops na values from vectors, but then copies their attributes back on: https://github.com/wch/r-source/blob/b8b2662a848a8eb87d67d66d737b00b6b2b127f2/src/library/stats/src/model.c#L228

Here's where the copying happens: https://github.com/wch/r-source/blob/b8b2662a848a8eb87d67d66d737b00b6b2b127f2/src/main/attrib.c#L302

So this bypasses all of the proper indexing... how annoying! :(

@khusmann khusmann changed the title model.frame crashes when drop.unused.levels = TRUE model.frame creates invalid interlaced vectors Sep 30, 2024
@khusmann
Copy link
Owner Author

One way to fix this, could be to detect when the na channel has been corrupted:

na_channel.interlacer_interlaced <- function(x, ...) {
  m <- attr(x, "na_channel_values")
  if (length(m) != length(x)) {
    unspecified(length(x))
  } else {
    m
  }
}

@khusmann
Copy link
Owner Author

khusmann commented Oct 2, 2024

Actually, what I should do is store a reference to the original data in another attribute. That way, I can detect if the current data changes its reference. If the current data changes its references (i.e. the data has been changed without updating its na_channel), then the na_channel should be invalidated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant