-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Owning reference to a string with O(1) access to lines #3
base: master
Are you sure you want to change the base?
Conversation
Any thoughts? |
Hey. Sorry. I completely overlooked this PR. I just randomly did some changes today, and well, your new comment pinged me. I'll just take a look at this PR :) |
I have a few questions and thoughts. So feel free to respond to those, before you spend time reworking the PR. (I'll review the PR now, and add the questions and thoughts as comments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again. Sorry for the lack or response until now. I somehow overlooked the original notification for this PR
/// - [`with_ending`](CachedLines::with_ending) | ||
/// | ||
/// | ||
#[cfg(feature = "alloc")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be easier to avoid needing all the #[cfg(feature = "alloc")]
, by having a:
#[cfg(feature = "alloc")]
mod cached;
and then adding this implementation to a cached.rs
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay either way, was mostly trying to keep things in one file, second alternative is
mod cached {
...
}
Will do.
#[cfg(feature = "alloc")] | ||
pub struct CachedLines { | ||
content: alloc::string::String, | ||
splits: alloc::vec::Vec<Range<usize>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead we could have a:
pub struct LineSpanWithoutText {
start: usize,
end: usize,
ending: usize,
}
and then:
splits: alloc::vec::Vec<Range<usize>>, | |
splits: alloc::vec::Vec<LineSpanWithoutText>, |
That way we could replace with_ending()
and without_ending()
with new()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while but as far as I remember the reasoning was that you are very unlikely to use both representations at once and having just one representation gives Index
and iterator goodies.
pub fn get(&self, index: usize) -> Option<&str> { | ||
self.content.get(self.splits.get(index)?.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(in extension of the previous comment)
Then have this be something like:
pub fn get(&self, index: usize) -> Option<&str> { | |
self.content.get(self.splits.get(index)?.clone()) | |
pub fn get(&self, index: usize) -> Option<LineSpan<'_>> { | |
// i.e. from splits get LineSpanWithoutText + self.contents => LineSpan | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, for get
this works.
type Output = str; | ||
|
||
fn index(&self, index: usize) -> &Self::Output { | ||
&self.content[self.splits[index].clone()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I was about to suggest calling .get()
and returning the LineSpan
, but this needs to return a reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index
needs to return a reference to so to accommodate both with and without ending - it needs to be decided during construction. Either two different constructors - like I did or using either newtype/type parameter.
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
|
||
#[test] | ||
fn test_cached_lines_no_endings() { | ||
let text = "\r\nfoo\nbar\r\nbaz\nqux\r\n\r"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that Rust 1.70 changed the behavior, so trailing \r
will appear in the final line. (Just in case this test suddenly)
pub struct CachedLines { | ||
content: alloc::string::String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can use alloc::borrow::Cow
instead to allow for borrowing a &str
instead of needing to turn it into an owned String
(Not sure if it creates other issues)
pub struct CachedLines { | |
content: alloc::string::String, | |
pub struct CachedLines<'a> { | |
content: alloc::borrow::Cow<'a>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay either way, but Cow
adds complexity without adding significant benefits - CachedLines
will have to allocate a vector of sizes regardless, adding a single allocation seems to be fine compared to the alternative of allocating and storing a whole bunch of lines.
/// | ||
/// | ||
#[cfg(feature = "alloc")] | ||
pub struct CachedLines { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, maybe it should be called LineSpanCache
/CachedLineSpans
. If it's changed to collect LineSpan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names are hard, sure.
#[derive(Debug, Clone)] | ||
/// Owning pointer to a string that provides O(1) access to line slices | ||
/// | ||
/// ``` | ||
/// use line_span::CachedLines; | ||
/// let cache = CachedLines::without_ending(String::from("hello\nworld")); | ||
/// assert_eq!(&cache[0], "hello"); | ||
/// assert_eq!(&cache[1], "world"); | ||
/// ``` | ||
/// | ||
/// Slices might or might not include line breaks depending on a function used to construct it | ||
/// | ||
/// - [`without_ending`](CachedLines::without_ending) | ||
/// - [`with_ending`](CachedLines::with_ending) | ||
/// | ||
/// | ||
#[cfg(feature = "alloc")] | ||
pub struct CachedLines { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the attribute, to after the comment
#[derive(Clone, Debug)]
pub struct CachedLines {
#[derive(Debug, Clone)] | ||
/// Owning pointer to a string that provides O(1) access to line slices | ||
/// | ||
/// ``` | ||
/// use line_span::CachedLines; | ||
/// let cache = CachedLines::without_ending(String::from("hello\nworld")); | ||
/// assert_eq!(&cache[0], "hello"); | ||
/// assert_eq!(&cache[1], "world"); | ||
/// ``` | ||
/// | ||
/// Slices might or might not include line breaks depending on a function used to construct it | ||
/// | ||
/// - [`without_ending`](CachedLines::without_ending) | ||
/// - [`with_ending`](CachedLines::with_ending) | ||
/// | ||
/// | ||
#[cfg(feature = "alloc")] | ||
pub struct CachedLines { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel like this should probably be feature gated, with a cache = []
feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I inlined proposed changes into my crate and using Index
to access strings, code is here: https://github.com/pacak/cargo-show-asm/blob/master/src/cached_lines.rs
There's no rush to get this included :)
Open questions are:
- Cow - yes/now. I decided against it since it's complexity without significant gains
- stored values -
Range<usize>
orLinespan
- former offers uniform interface forget
andIndex
, later seems to be a bit less useful - put under separate feature or not. I decided against it, "alloc" limits what platforms you can run on, having "cached" as a separate feature saves a tiny bit of compilation time at expense of user confusion :)
What are your preferences herE?
/// - [`with_ending`](CachedLines::with_ending) | ||
/// | ||
/// | ||
#[cfg(feature = "alloc")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay either way, was mostly trying to keep things in one file, second alternative is
mod cached {
...
}
Will do.
/// | ||
/// | ||
#[cfg(feature = "alloc")] | ||
pub struct CachedLines { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names are hard, sure.
pub struct CachedLines { | ||
content: alloc::string::String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay either way, but Cow
adds complexity without adding significant benefits - CachedLines
will have to allocate a vector of sizes regardless, adding a single allocation seems to be fine compared to the alternative of allocating and storing a whole bunch of lines.
#[cfg(feature = "alloc")] | ||
pub struct CachedLines { | ||
content: alloc::string::String, | ||
splits: alloc::vec::Vec<Range<usize>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while but as far as I remember the reasoning was that you are very unlikely to use both representations at once and having just one representation gives Index
and iterator goodies.
type Output = str; | ||
|
||
fn index(&self, index: usize) -> &Self::Output { | ||
&self.content[self.splits[index].clone()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index
needs to return a reference to so to accommodate both with and without ending - it needs to be decided during construction. Either two different constructors - like I did or using either newtype/type parameter.
pub fn get(&self, index: usize) -> Option<&str> { | ||
self.content.get(self.splits.get(index)?.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, for get
this works.
This can be useful if you have a string that you don't want to modify but do want to access line by line in a random order.