Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect throws an error in a hub without any model output files #4

Closed
Tracked by #18
elray1 opened this issue Feb 2, 2024 · 3 comments · Fixed by #18
Closed
Tracked by #18

collect throws an error in a hub without any model output files #4

elray1 opened this issue Feb 2, 2024 · 3 comments · Fixed by #18

Comments

@elray1
Copy link
Contributor

elray1 commented Feb 2, 2024

If I run the following in a hub that contains no model output files:

library(hubUtils)
library(dplyr)

model_outputs <- hubUtils::connect_hub(hub_path = ".") %>%
  dplyr::collect()

I get this error:

Error in `UseMethod()`:
! no applicable method for 'collect' applied to an object of class "c('hub_connection', 'list')"
Backtrace:
 1. hubUtils::connect_hub(hub_path = ".") %>% dplyr::collect()
 2. dplyr::collect(.)
@annakrystalli annakrystalli transferred this issue from hubverse-org/hubUtils Feb 29, 2024
@annakrystalli
Copy link
Contributor

annakrystalli commented Mar 6, 2024

Thanks for the report @elray1 ! This is a result of a conscious prior decision to not error when connecting to an empty hub. Using connect_hub on an empty hub still loads config metadata but the subclass is an empty list instead of an arrow dataset. Hence dplyr::collect doesn't have a method for it and fails.

This would likely be non-ideal to fix from within connect_hub() but it made me think of a somewhat related discussion @nickreich had raised a while ago (transferred to an issue in this repo: #13 ) in which the ability to run as_model_out_tbl() on the results of collect by default was proposed through a collect_hub() function. Such a function would be an ideal place to handle the empty hub exception too.

Note that one thing we could also do is rather than have a different function (collect_hub()) we could have a collect() S3 method of our own that is dispatched on hub-connection class objects.

@elray1
Copy link
Contributor Author

elray1 commented Mar 7, 2024

I like the idea of our own collect method

@annakrystalli
Copy link
Contributor

In the end, trying to handle this in a hubverse collect() method won't work sadly because collect() may well be preceded by queries, e.g. filter()

hub_path <- system.file("testhubs/empty", package = "hubUtils")
hub_con <- hubData::connect_hub(hub_path) |> 
    dplyr::filter(is.na(output_type_id)) |>
    dplyr::collect()
#> Warning in hubData::connect_hub(hub_path): No files of file formats "csv", "parquet", and "arrow" found in model output
#> directory.
#> Error in UseMethod("filter"): no applicable method for 'filter' applied to an object of class "c('hub_connection', 'list')"

Created on 2024-03-27 with reprex v2.0.2

But in both filtering and just collecting, I'm getting an informative warning from connect_hub() that the hub is effectively empty. Is that not helpful enough? Or are you for some reason not getting that warning @elray1 ?

hub_path <- system.file("testhubs/empty", package = "hubUtils")
hub_con <- hubData::connect_hub(hub_path) |> dplyr::collect()
#> Warning in hubData::connect_hub(hub_path): No files of file formats "csv", "parquet", and "arrow" found in model output
#> directory.
#> Error in UseMethod("collect"): no applicable method for 'collect' applied to an object of class "c('hub_connection', 'list')"

Created on 2024-03-27 with reprex v2.0.2

@annakrystalli annakrystalli linked a pull request Mar 27, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants