Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str::from_bytes #2268

Closed
jesse99 opened this issue Apr 22, 2012 · 11 comments
Closed

str::from_bytes #2268

jesse99 opened this issue Apr 22, 2012 · 11 comments
Labels
A-unicode Area: Unicode C-cleanup Category: PRs that clean code up or issues documenting cleanup.

Comments

@jesse99
Copy link
Contributor

jesse99 commented Apr 22, 2012

The semantics of this should be tightened up (or probably even better new function(s) with better names should be added). Is the buffer supposed to be utf-8? 7-bit ASCII?

Also it should probably stop if it hits a NUL character. One place where this is annoying is inet_ntop which writes an ASCII representation of an address to a user provided buffer. With the way from_str works now you have to write silly code like:

    alt vec::position(buffer, {|c| c == 0u8})
    {
        option::some(i)
        {
            str::from_bytes(vec::slice(buffer, 0u, i))
        }
        option::none
        {
            str::from_bytes(buffer)
        }
    }
@brson
Copy link
Contributor

brson commented Apr 23, 2012

The vector argument to from_bytes must be valid UTF-8. Needs better docs.

For converting from C strings usually str::unsafe::from_c_str is more appropriate. It takes an unsafe pointer and stops at nulls.

@jesse99
Copy link
Contributor Author

jesse99 commented Apr 24, 2012

I actually have a [u8] buffer so from_c_str won't work without a cast. unsafe::from_bytes did stop at null characters though the docs don't specify how it is supposed to behave.

@graydon
Copy link
Contributor

graydon commented May 2, 2012

The docs for from_bytes say that it fails when provided with invalid UTF-8.
The docs for from_c_str say that it consumes a null-terminated C string, and only takes a pointer.
unsafe::from_bytes does not stop at a null byte. Can you provide a testcase showing that occurring? It should not do so.

@jesse99
Copy link
Contributor Author

jesse99 commented May 3, 2012

Not sure why I thought str::unsafe::from_bytes stopped at nulls; it's working as you described. I'm don't know why str::from_bytes adds null bytes to the string though. It says that it converts "bytes to a UTF-8 string" and nulls are not valid utf-8.

Which gets back to my original point: why is the safe interface even providing methods for operating on byte arrays? It's not like you can safely do anything with an arbitrary byte sequence so why not be more explicit and more useful and have from_utf8 instead?

@brson
Copy link
Contributor

brson commented May 7, 2012

from_bytes could be renamed to from_utf8 to be consistent with from_utf16

@bblum
Copy link
Contributor

bblum commented Jun 10, 2013

There is also the confusing from_bytes_with_null, which also does not stop at nulls, but requires that the last character be null. Nobody outside of std::str appears to use it.

@graydon, not sure which way you mean it should or shouldn't do, but a test case:

use std::str;

fn main() {
    let a = ~[65, 65, 65, 0, 65, 65];
    println(str::from_bytes(a));
}

Prints 5 "A"s.

I propose we (a) remove from_bytes_with_null, (b) rename from_bytes in its current incarnation to from_utf8_ignore_null, and (c) add a stops-at-null version called from_utf8.

@erickt
Copy link
Contributor

erickt commented Jun 11, 2013

cc'ing #7039, where I'm doing some cleanup of the bytes-to-str cleanup. @brson: I'll rename from_bytes to from_utf8.

@emberian
Copy link
Member

emberian commented Aug 5, 2013

Still relevant. @erickt still going to do the rename?

@erickt
Copy link
Contributor

erickt commented Aug 5, 2013

@cmr: yep, it's next up on my plate once #8296 lands.

@thestinger
Copy link
Contributor

@jesse99: \0 is most definitely valid UTF-8, so it's only possible to represent a subset of UTF-8 in a C string

the string module doesn't need any special handling of \0 beyond conversion to and from C strings

@thestinger
Copy link
Contributor

Replaced with #8985, there's nothing else to do beyond renaming.

flip1995 pushed a commit to flip1995/rust that referenced this issue Aug 28, 2020
In rust-lang#2268 I idly mused that the other user-overloadable operations could be added to this lint. Knowing that the lint was arguably incomplete was gnawing at the back of my mind, so I figured that I might as well make this PR, particularly given the change needed was so small.
flip1995 pushed a commit to flip1995/rust that referenced this issue Aug 28, 2020
Add the other overloadable operations to suspicious_arithmetic_impl

In rust-lang#2268 I idly mused that the other user-overloadable operations could be added to this lint. Knowing that the lint was arguably incomplete was gnawing at the back of my mind, so I figured that I might as well make this PR, particularly given the change needed was so small.

changelog: Start warning on suspicious implementations of the `BitAnd`, `BitOr`, `BitXor`, `Rem`, `Shl`, and `Shr` traits.
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 22, 2022
test that &mut !Unpin references are protected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-unicode Area: Unicode C-cleanup Category: PRs that clean code up or issues documenting cleanup.
Projects
None yet
Development

No branches or pull requests

7 participants