Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eurostat issue #55

Closed
serkor1 opened this issue Aug 7, 2024 · 8 comments · Fixed by #56
Closed

Eurostat issue #55

serkor1 opened this issue Aug 7, 2024 · 8 comments · Fixed by #56

Comments

@serkor1
Copy link
Contributor

serkor1 commented Aug 7, 2024

Hi,

I have a slight issue with parsing calendar data from Eurostat using ic_dataframe and/or ic_read. This issue only arises when I use {calendar}, there is no issue when using {ical}.

The issue is as follows; the dates are not parsing correctly when using ic_dataframe or ic_read, but this is not an issue when using ical::ical_parse_df. The issue can be mitigated, however, by using a mix of ic_list, lapply and do.call. See the MWE below,

rm(list = ls()); gc();
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  629306 33.7    1434458 76.7   727117 38.9
#> Vcells 1108619  8.5    8388608 64.0  1972973 15.1
ical <- readLines(
  "https://ec.europa.eu/eurostat/o/calendars/eventsIcal?theme=2&category=2"
)
head(
  DT <- calendar::ic_dataframe(
    ical
  )$`DTSTART;VALUE=DATE`
)
#> Warning in `[<-.data.frame`(`*tmp*`, date_cols, value = list(structure(19363,
#> class = "Date"), : provided 125 variables to replace 1 variables
#> [1] "2023-01-06" "2023-01-06" "2023-01-06" "2023-01-06" "2023-01-06"
#> [6] "2023-01-06"
ical_list <- calendar::ic_list(
  x = ical
)

head(
  DT <- do.call(
    rbind,
    lapply(
      ical_list,
      function(element){
        # 1) remove the last
        # element as it is a bunch
        # of html codes
        element <- element[-length(element)]

        # 2) split element
        split_element <- strsplit(
          element,
          split = ":"
        )

        do.call(
          cbind,
          lapply(
            split_element,
            function(x){

              DT <- data.frame(
                value = x[2]
              )

              names(DT) <- x[1]

              DT

            }
          )
        )

      }
    )
  )$`DTSTART;VALUE=DATE`
)
#> [1] "20230106" "20230106" "20230110" "20230111" "20230111" "20230118"
# package
head(
  DT <- ical::ical_parse_df(
    text = ical
  )$start
)
#> [1] "1970-01-01 01:00:00 CET" "2023-01-06 01:00:00 CET"
#> [3] "2023-01-06 01:00:00 CET" "2023-01-10 01:00:00 CET"
#> [5] "2023-01-11 01:00:00 CET" "2023-01-11 01:00:00 CET"

Created on 2024-08-07 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Zorin OS 17.1
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_US:en
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Copenhagen
#>  date     2024-08-07
#>  pandoc   3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  calendar      0.1.0   2024-04-28 [1] CRAN (R 4.4.1)
#>  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  curl          5.2.1   2024-03-01 [1] CRAN (R 4.4.0)
#>  digest        0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
#>  evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.4.0)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
#>  fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  ical          0.1.6   2019-01-21 [1] CRAN (R 4.4.1)
#>  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.4.0)
#>  knitr         1.47    2024-05-29 [1] CRAN (R 4.4.0)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.4.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.4.0)
#>  R.oo          1.26.0  2024-01-24 [1] CRAN (R 4.4.0)
#>  R.utils       2.12.3  2023-11-18 [1] CRAN (R 4.4.0)
#>  Rcpp          1.0.12  2024-01-09 [1] CRAN (R 4.4.0)
#>  reprex        2.1.0   2024-01-11 [3] CRAN (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.27    2024-05-17 [1] CRAN (R 4.4.0)
#>  rstudioapi    0.16.0  2024-03-24 [3] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  styler        1.10.3  2024-04-07 [1] CRAN (R 4.4.0)
#>  V8            4.4.2   2024-02-15 [2] CRAN (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.4.0)
#>  xfun          0.45    2024-06-16 [1] CRAN (R 4.4.0)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.4.0)
#> 
#>  [1] /home/serkan/R/x86_64-pc-linux-gnu-library/4.4
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@Robinlovelace
Copy link
Member

Many thanks for the reproducible example @serkor1. I don't have any time to look at this right now, do you have any ideas for a fix?

@serkor1
Copy link
Contributor Author

serkor1 commented Aug 7, 2024

Hi @Robinlovelace - I actually don't as I am new to the package.

But I would be happy to browse and explore and possibly post a fix during the weekend to assist the development, if you want!

@Robinlovelace
Copy link
Member

But I would be happy to browse and explore and possibly post a fix during the weekend to assist the development, if you want!

That would be amazing, any questions you have just let me know, also cc package co-author @layik 🙏

@serkor1
Copy link
Contributor Author

serkor1 commented Aug 7, 2024

Actually, it's a trivial fix it seems. See the code below,

https://github.com/ATFutures/calendar/blob/bfd95e0cced2a9977b6c7b9b38502e9ec6557006/R/ic_dataframe.R#L38C3-L44C4

You are assigning x_df[date_cols] <- lapply(x_df[, date_cols], ic_date) which assigns a list to the date_cols. The fix is to extract the columns as data.frames, using the following code,

x_df[date_cols] <- lapply(x_df[date_cols], ic_date)

Full solution:

ic_dataframe <- function(x) {

  if(methods::is(object = x, class2 = "data.frame")) {
    return(x)
  }

  stopifnot(methods::is(object = x, class2 = "character") | methods::is(object = x, class2 = "list"))

  if(methods::is(object = x, class2 = "character")) {
    x_list <- ic_list(x)
  } else if(methods::is(object = x, class2 = "list")) {
    x_list <- x
  }


  x_list_named <- lapply(x_list, function(x) {
    ic_vector(x)
  })


  x_df <- ic_bind_list(x_list_named)


  date_cols <- grepl(pattern = "VALUE=DATE", x = names(x_df))

  if(any(date_cols)) {
    x_df[date_cols] <- lapply(x_df[date_cols], ic_date)
  }
  datetime_cols <- names(x_df) %in% c("DTSTART", "DTEND")
  if(any(datetime_cols)) {
    x_df[datetime_cols] <- lapply(x_df[datetime_cols], ic_datetime)
  }

  # names(x_df) <- gsub(pattern = ".VALUE.DATE", replacement = "", names(x_df))

  x_df
}

Showcase the solution

# library
rm(list = ls()); gc(); devtools::load_all()

# read ical
ical <- readLines(
  "https://ec.europa.eu/eurostat/o/calendars/eventsIcal?theme=2&category=2"
)

# convert to data.frame
DT <- ic_dataframe(
  ical
)

# check dates
head(DT$`DTSTART;VALUE=DATE`)

# > "2023-01-06" "2023-01-06" "2023-01-10" "2023-01-11" "2023-01-11" "2023-01-18"

Which is what we want. I have run devtools::check() which runs without errors! The fix should be safe and trivial to implement!

Edit: Assuming that the helper functions works without any issues I believe this solution is robust. I have tested this by adding a few more date variables after locating the offending lines of code!

@Robinlovelace
Copy link
Member

Great work, super simple fix, thank you so much! So this line and perhaps one other need to change?

x_df[date_cols] <- lapply(x_df[, date_cols], ic_date)

You should be able to edit the file and put in a PR here: ...

@Robinlovelace
Copy link
Member

@serkor1
Copy link
Contributor Author

serkor1 commented Aug 7, 2024

Yes, both needs to be changed. I didn't want to do a PR for such a trivial fix!

But I'll do one later this evening, unless you wan't to fix it right now 😃

Great package by the way, really love it!

@Robinlovelace
Copy link
Member

Awesome! Yes, will await your input, as a learning experience. Your idea so will be good to have your name on the fix 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants