-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - Feather format for cache files #225
Comments
Since the cache files shouldn't be accessed by users on their own I would rather stick to the convention over configuration philosophy and just introduce it as the default if it provides all functionality we need. It sounds pretty awesome, and like something I could really use in my day to day work 👍 |
@Hugovdberg the I don't disagree with the default options of |
Better performance shouldn't be an option, it should just be implemented right? ;-) Some checks for object types should be built in anyway to make this possible, so I would suggest we add a I can try to implement this sometime soon, probably over the weekend. |
Always want better performance 👍 If Here is my benchmark comparison between
|
|
I was just looking into using feather for the cache, but my proposed I did some benchmarks comparing My suggestion therefore would be to do the following (pseudo coded): # Write to cache
if (identical(class(x), 'data.frame') || is.tibble(x)) {
write_feather(x, file)
} else {
save(x, file)
}
# Read from cache
if (file_extension == '.feather') {
assign(varname, read_feather(file))
} else {
load(file)
} The major advantage of this is that there is no backward incompatibility. Files should not be manually saved to or read from the cache directory, so we can accept a hybrid state, even just writing new files to Regarding just implementing or making it optional: I don't feel bad about adding a dependency on |
This looks good. If |
@Hugovdberg just following up on this. Are you taking care of this feature, or would you like someone else to pick it up? cheers! |
As mentioned in #191 we should investigate performance of |
Agreed, thanks for bringing A quick look at these packages leads me to state...
Why can't we support both as an option to |
We should choose one. |
If we had to pick one, I would also pick |
I agree
This means that every time we don't get exactly the same result back from |
Has this issue been raised with the I like checking that the result is the same. A bit concerned that there will be a mixture of |
@Hugovdberg this is actually from a usecase I face everyday within the same project. When working with large dataframes (10m+, 100m+ rows), R's interactive visualization methods (plotly, ggplotly etc) are painfully slow. This is an example of a workflow like:
Another colleague does something similar, where Python's dictionaries and capability to use hashtables results in a combined R+Python approach. Thus, feather+cache has come very handy. Re: the pseudocode, that seems fine and it'll work well for smaller datasets. The |
Hi all, I would like to implement the qs (see my comment above) as a (possible) replacement for the RData format as cache files. My questions now are:
What do you think? |
Hi, This sounds like a really great idea! What I would like to do for compatibility is a staged approach:
I like these staged rollouts because it makes it easier to find and fix errors. In Stage 1 we are getting bugs from people who know what they are doing. This helps us more easily debug problems with the qs format. Stage 2 we start to get the newby problems since someone downloads ProjectTemplate, sets up a new project and then runs into issues with qs. Stage 3 lets us discover migration issues before rolling into stage 4. Does this plan work for you? |
Sure, this plan works very well for me. And I'm really glad you like the idea. I'll add Once I'm done, I'll open a pull request. |
Report an Issue / Request a Feature
I'm submitting a (Check one with "x") :
Issue Severity Classification -
(Check one with "x") :
Expected Behavior
I recommend adding the option of using the
feather
file format as an option when we cache dataframe objects. Read aboutfeather
here and here.Advantages of feather for dataframe objects:
feather
at work, and it's been really really fast.feather
was developed by Wes McKinney & Hadley Wickam together. This helps when collaborating on projects with Python aspects.Possible Solution
Potential call:
cache('dataframe_object', feather = TRUE)
should save an object called
dataframe_object.feather
.The text was updated successfully, but these errors were encountered: