Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: strings: add (*Reader).String() #36290

Closed
pierrec opened this issue Dec 27, 2019 · 16 comments
Closed

proposal: strings: add (*Reader).String() #36290

pierrec opened this issue Dec 27, 2019 · 16 comments

Comments

@pierrec
Copy link

pierrec commented Dec 27, 2019

Although it is possible to retrieve the string that a Reader contains, it would be nice to avoid this and have direct access to the string that a Reader contains.

Therefore I propose the following:

// String always returns the string the Reader was initialized or reset with,
// irrelevant of any other method calls.
func (r *Reader) String() string {
  return r.s
}

So the code required to get access to a Reader's content becomes:

Before:

  r := strings.NewReader("hello")
  var buf bytes.Buffer
  r.Seek(0, io.SeekStart) // make sure we get the whole string)
  io.Copy(&buf, r)
  s := buf.String() // hello

After:

  r := strings.NewReader("hello")
  s := r.String() // hello
@gopherbot gopherbot added this to the Proposal milestone Dec 27, 2019
@smasher164
Copy link
Member

I find it counterintuitive that calling String returns the underlying string instead of the string starting at index i, based on how many bytes one has read from the Reader.

On the other hand, the string can be invalid if it begins at any arbitrary index, so if returning a valid string must be guaranteed, the closest behavior I can think of is

func (r *Reader) String() string {
    return ToValidUTF8(r.s[r.i:], "")
}

@pierrec
Copy link
Author

pierrec commented Dec 28, 2019

@smasher164 I don't think there is the need to worry about returning an invalid string, as there is precedence in (*bytes.Buffer).String().

To be consistent with (*bytes.Buffer).String() however, (*strings.Reader).String() could return the unread portion of the string. To get the whole string, use Seek().

@RodionGork
Copy link

RodionGork commented Dec 28, 2019

@pierrec Sorry, which Reader is it about? (from io?) should it always have an underlying string? This doesn't look obvious (and even correct) or I deeply misunderstood the suggestion.

String() could return the unread portion of the string

Should it advance pointer then? So that second call to String() won't return anything (as everything is read)?

@deanveloper
Copy link

deanveloper commented Dec 28, 2019

@RodionGork Well it can't be io.Reader since that is an interface, and methods cannot be defined on interfaces. I believe that this proposal is to add it for *strings.Reader since the examples given use strings.NewReader.

I agree with @smasher164 in that it should return the underlying string, however not all strings are utf8 strings (blog), so I disagree that it should be wrapped in ToValidUTF8. It should be the caller's responsibility to call strings.ToValidUTF8 if they want a utf8 string, not the callee's.

@RodionGork
Copy link

@deanveloper thanks, I believe you are right (just had a momentary suspicion it is about extending interface and implementing method everywhere).

But then, if the Reader should resemble bytes.Buffer more, then besides String it makes sense to copy few more methods, e.g. ReadString and Next - isn't it so?

so I disagree that it should be wrapped in ToValidUTF8

yep, it would be confusing to receive empty string when reader is not at end (or error which is not part of behavior in bytes.Buffer)

@pierrec
Copy link
Author

pierrec commented Dec 28, 2019

@RodionGork this proposal is not about making it resemble bytes.Buffer. It is about being able to easily recover the string that the reader is based on. I have been running into this a few times and think it would be a straight forward addition.

@smasher164
Copy link
Member

I agree with @smasher164 in that it should return the underlying string, however not all strings are utf8 strings

I see your point, but the package header specifically states, "package strings implements simple functions to manipulate UTF-8 encoded strings."

That being said, since strings.Builder allows one to construct a malformed string using the package, I don't see why strings.Reader shouldn't be allowed to, so I'm open to simply returning r.s[r.i:].

@pierrec I am still trying to understand the use case of this proposal. Whenever I use strings.Reader, I am trying to pass some input into a function that expects an io.Reader. I tend to always have access to the original string. Is the intended use case where one has to type-assert on an io.Reader to get a strings.Reader and get access to its underlying string?

@deanveloper
Copy link

deanveloper commented Dec 28, 2019

I see your point, but the package header [snip]

That's true, I can also see it going either way. Just seems weird to me that strings.NewReader(str).String() wouldn't necessarily return str in the case where it uses ToValidUTF8

@rsc
Copy link
Contributor

rsc commented Jan 8, 2020

strings.Reader and bytes.Reader were meant to be the same API, just with different input types in the constructor. They're all about the streaming calls, not the bulk data of, say, bytes.Buffer. There's also ambiguity about which string should be returned - the remaining input or the original. So adding the API might confuse half the users anyway. And for bytes.Reader it would have to make a copy.

As @smasher164 says, usually you use strings.Reader because you already have the string and want to pass it to a function as an io.Reader (or one of the other interfaces).

What is the use case where you are passed a *strings.Reader and want to get the string back? How does that come up?

@pierrec
Copy link
Author

pierrec commented Jan 8, 2020

I had to convert a type which was initially an io.ReadCloser to an io.ReadWriteCloser. That type was implemented using, you guessed it, a strings.Reader, and the hack I mention in the proposal, which seemed bad.
Coming back to it, I have cleaned up the type and split it into 2 types so that the need for this proposal becomes obsolete.
So... as and @rsc mentions, as it stands, the introduction of a String method would be confusing.
Thanks for your attention. Closing.

@ianlancetaylor
Copy link
Contributor

For the record, you can retrieve the remainder of the string in a strings.Reader like this. This uses panic so that the reader is left unchanged.

type returnedString string

type fetchString struct{}

func (fetchString) WriteString(s string) (int, error) {
	panic(returnedString(s))
}

func (fetchString) Write([]byte) (int, error) {
	panic("can't happen")
}

func FetchString(sr *strings.Reader) (ret string) {
	defer func() {
		if s, ok := recover().(returnedString); ok {
			ret = string(s)
		} else {
			panic(recover())
		}
	}()
	sr.WriteTo(fetchString{})
	panic("can't happen")
}

@pierrec
Copy link
Author

pierrec commented Jan 9, 2020

@ianlancetaylor yikes!
Shouldn't it just return at the end of FetchString instead of panicing if there is nothing left to read from the strings.Reader?

@deanveloper
Copy link

deanveloper commented Jan 9, 2020

The , ok represents if the recovered panic was of type returnedString. If the recovered panic isn't a returnedString, then the panic wasnt from the WriteString function, meaning that it should continue unwrapping the stack.

@pierrec
Copy link
Author

pierrec commented Jan 9, 2020

Yep. I meant this:

type returnedString string

type fetchString struct{}

func (fetchString) WriteString(s string) (int, error) {
	panic(returnedString(s))
}

func (fetchString) Write([]byte) (int, error) {
	panic("can't happen")
}

func FetchString(sr *strings.Reader) (ret string) {
	defer func() {
		if s, ok := recover().(returnedString); ok {
			ret = string(s)
		} else {
			panic(recover())
		}
	}()
	sr.WriteTo(fetchString{})
	return
}

as looking at the strings.Reader.WriteTo method code, it returns before calling WriteString if there is nothing to read.

@deanveloper
Copy link

Ahh I see that now - my bad. Misinterpreted your initial comment and I thought you meant inside the defer for some reason.

@ianlancetaylor
Copy link
Contributor

@pierrec Yes, I think you're right. Oh well.

@golang golang locked and limited conversation to collaborators Jan 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants