-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] Add iterator classes #1274
Merged
Merged
[r] Add iterator classes #1274
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
7d4d103
add: iterator classes experiment
pablo-gar bb82a0b
Add iter classes
pablo-gar 6e68ad2
Merge branch 'main' into pablo-gar/r_read_iterators
pablo-gar 46c4b1e
Improve classes
pablo-gar ff25eee
Add iters
pablo-gar 34c8d41
Improve iterators
pablo-gar b210ce2
First iterators MVP
pablo-gar 155585c
Update doc strings
pablo-gar a415191
update docs
pablo-gar 4534226
remove dup function
pablo-gar 2d22be8
update iterators
pablo-gar 7906eec
Update tests
pablo-gar 9f6fe58
Merge branch 'main' into pablo-gar/r_read_iterators
pablo-gar 5d1ca4d
Fix bugs
pablo-gar 7017d27
Update docs
pablo-gar bc5218f
Refractor to follow spec; update tests
pablo-gar 093226b
Merge branch 'main' into pablo-gar/r_read_iterators
pablo-gar 85cca42
Fix bugs after main merge
pablo-gar 1917755
Update docs
pablo-gar b4aecf3
remove comment
pablo-gar 07c2e94
update vignettes
pablo-gar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
#' SOMA Read Iterator Base class | ||
#' | ||
#' Class that allows for read iteration of SOMA reads. | ||
|
||
ReadIter <- R6::R6Class( | ||
classname = "ReadIter", | ||
|
||
public = list( | ||
|
||
#' @description Create (lifecycle: experimental) | ||
#' @param sr soma read pointer | ||
initialize = function(sr) { | ||
private$soma_reader_pointer <- sr | ||
}, | ||
|
||
#' @description Check if iterated read is complete or not. (lifecycle: experimental) | ||
#' @return logical | ||
read_complete = function() { | ||
if (is.null(private$soma_reader_pointer)) { | ||
TRUE | ||
} else { | ||
sr_complete(private$soma_reader_pointer) | ||
} | ||
}, | ||
|
||
#' @description Read the next chunk of an iterated read. (lifecycle: experimental). | ||
#' If read is complete, retunrs `NULL` and raises warning. | ||
#' @return \code{NULL} or one of arrow::\link[arrow]{Table}, \link{matrixZeroBasedView} | ||
read_next = function() { | ||
if (is.null(private$soma_reader_pointer)) { | ||
NULL | ||
} else { | ||
if (self$read_complete()) { | ||
warning("Iteration complete, returning NULL") | ||
NULL | ||
} else { | ||
rl <- sr_next(private$soma_reader_pointer) | ||
return(private$soma_reader_transform(rl)) | ||
} | ||
} | ||
}, | ||
|
||
#' @description Concatenate remainder of iterator | ||
# to be refined in derived classes | ||
concat = function() { | ||
.NotYetImplemented() | ||
} | ||
|
||
), | ||
|
||
private = list( | ||
|
||
# Internal 'external pointer' object used for iterated reads | ||
soma_reader_pointer = NULL, | ||
|
||
# to be refined in derived classes | ||
soma_reader_transform = function(x) { | ||
.NotYetImplemented() | ||
} | ||
|
||
) | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the very belated review -- @aaronwolen asked me to help review, and I'm happy to -- I believe my review is speaking for the both of us
There are definitely multiple ways to go here. However, given that on the Python side we have a single
read
, options of.tables
, etc., andconcat
-- e.g.sdf.read().tables().concat()
-- we should do the same here. It won't be ideal for everyone but it will work, it will parallel the Python implementation, and we can add some keystroke-saving syntactic-sugar functions on top of these later if we want.Let's get this
concat
going, and get rid of theiterated
argument toread
. Thenread
will always return an iterator (low complexity), andtables()
et al. will transform an iterator of one type to an iterator of another, andconcat
will have the single job of doing concatenation.Note that
apis/r/R/utils-readerTransformers.R
on this PR already has reader-transformers, so (I think and hope) it should be easy to connect them as methods on the iterator objects.Likewise,
concat()
exists in non-stub form elsewhere on this PR and should be able to be connected as a method on the iterator objects.If
read_next
remains, it needs to be remain as a helper method, not as public API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with everything and I will try to get this done tonight.
@johnkerl I'm not sure that the suggestion here is? Unfortunately in R there's no generic
next()
function, so I'm not sure how we could give users the ability to iterate themselves unless we:read_next
a public methodnext()
that calls internallyread_next
The second one is a little less intuitive imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the benefit to this. While R does have package that implements iterators, it isn't widely used (or at all in the single-cell world). Because iterators don't exist in R, all this does is add complexity to R users. While I understand the desire to make the R API to act like the Python API, we do need to keep in mind that R is not Python and there are things that Python handles natively that just don't work as well in R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key is here:
and imo it's not a "if we want" but more a "we need to". I think we can keep the low-level functionality in the R6 level as proposed and then we provide wrapper functions like
as_arrow_table(soma_obj)
see here,as(soma_obj, "dgCMatrix")
,as.data.frame(soma_obj)
, etcThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnkerl I have addressed your comments with the exception of:
I also updated the usage section of top-level comment. The branch still needs to merge main but wanted to get your early thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pablo-gar looking this morning -- thank you! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pablo-gar sorry to confuse.
This looks good to me -- thank you!! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I will update the docs and merge main into the PR branch