Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create hub_connection collect method #17

Closed
Tracked by #18
annakrystalli opened this issue Mar 27, 2024 · 2 comments · Fixed by #18
Closed
Tracked by #18

Create hub_connection collect method #17

annakrystalli opened this issue Mar 27, 2024 · 2 comments · Fixed by #18
Assignees

Comments

@annakrystalli
Copy link
Contributor

Thanks for the report @elray1 ! This is a result of a conscious prior decision to not error when connecting to an empty hub. Using connect_hub on an empty hub still loads config metadata but the subclass is an empty list instead of an arrow dataset. Hence dplyr::collect doesn't have a method for it and fails.

This would likely be non-ideal to fix from within connect_hub() but it made me think of a somewhat related discussion @nickreich had raised a while ago (transferred to an issue in this repo: #13 ) in which the ability to run as_model_out_tbl() on the results of collect by default was proposed through a collect_hub() function. Such a function would be an ideal place to handle the empty hub exception too.

Note that one thing we could also do is rather than have a different function (collect_hub()) we could have a collect() S3 method of our own that is dispatched on hub-connection class objects.

Originally posted by @annakrystalli in #4

@annakrystalli annakrystalli self-assigned this Mar 27, 2024
@annakrystalli
Copy link
Contributor Author

The method will:

  • Check if the hub_connection object is an empty list and return NULL with a warning if so.
  • Otherwise, call dplyr::collect() with next method.
  • If dplyr::collect() succeeds, try as_model_out_tbl() tibble on output. If it succeeds return return model_out_tbl, otherwise return original tbl with silenceable message.

The function will resolve #4 and #13

@annakrystalli
Copy link
Contributor Author

Sadly it is not possible to create a hub_connection specific method because query objects passed to collect do not retain the hub_connection class (the have an arrow_dplyr_query class). So the above approach would only work in the minority of cases when collect() is run directly on a hub_connection object to return all hub data.

As such I have gone ahead an implemented the initial idea of a collect_hub() wrapper around dplyr::collect() that also coerces to model_out_tbl where possible.

@annakrystalli annakrystalli linked a pull request Mar 27, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

1 participant