splitting Vec<u8> of non unicode characters #928
-
hello, I have a u8 vector, there can be non-unicode characters in that vector and I needed to divide it wherever there is a \0, but with the fact that if there is a \n or \r\n, it will not stop but will continue. Can you please show me example? |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 7 replies
-
I think you probably want to use As for a specific example, I don't really understand what you're saying. Could you please provide some input and the desired output? |
Beta Was this translation helpful? Give feedback.
-
i would just want example on how to build Regex::new from regex::bytes
crate to cover my question
2022-11-17 13:32 GMT+01:00, Andrew Gallant ***@***.***>:
… I think you probably want to use
[`bytes::Regex`](https://docs.rs/regex/latest/regex/bytes/struct.Regex.html)
for this.
As for a specific example, I don't really understand what you're saying.
Could you please provide some input and the desired output?
--
Reply to this email directly or view it on GitHub:
#928 (comment)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
It's right there in the link I gave you. :-) For example: https://docs.rs/regex/latest/regex/bytes/struct.Regex.html#method.find |
Beta Was this translation helpful? Give feedback.
-
yesterday i wanted to build with unicode disabled and when i did
let re = regex::bytes::Regex::new(r"0x0").unwrap();
it didn't work
2022-11-17 14:18 GMT+01:00, Andrew Gallant ***@***.***>:
… It's right there in the link I gave you. :-) For example:
https://docs.rs/regex/latest/regex/bytes/struct.Regex.html#method.find
--
Reply to this email directly or view it on GitHub:
#928 (comment)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=65826762b40d386ae495a1781305930f
2022-11-17 14:45 GMT+01:00, Andrew Gallant ***@***.***>:
… This might help: https://jvns.ca/blog/good-questions/
(I linked to [ESR's version of the same
thing](http://www.catb.org/%7Eesr/faqs/smart-questions.html) previously, but
I forgot just how patronizing it was.)
--
Reply to this email directly or view it on GitHub:
#928 (reply in thread)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
output will be later Vec<&[u8]>
but about that i am fine and if your regex idea will not stop after
non utf16 characters then you did what i wanted
2022-11-17 15:10 GMT+01:00, Andrew Gallant ***@***.***>:
… Notice that you don't even need a `bytes::Regex` because a NUL byte is valid
UTF-8.
--
Reply to this email directly or view it on GitHub:
#928 (reply in thread)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'll try to translate it so you can better understand what I need:
in the example I sent you, there may be bytes that are not utf8 but
may be utf16. I need to split them, but in such a way that they can be
collected at the end into Vec<&[u8]> or Vec<Vec<u8>>, but I believe
that if they are split so their appearance will be preserved as they
are and I will not be blamed if they do not carry utf 8
2022-11-17 15:14 GMT+01:00, Peter Kubek ***@***.***>:
… output will be later Vec<&[u8]>
but about that i am fine and if your regex idea will not stop after
non utf16 characters then you did what i wanted
2022-11-17 15:10 GMT+01:00, Andrew Gallant ***@***.***>:
> Notice that you don't even need a `bytes::Regex` because a NUL byte is
> valid
> UTF-8.
>
> --
> Reply to this email directly or view it on GitHub:
> #928 (reply in thread)
> You are receiving this because you authored the thread.
>
> Message ID:
> ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
do you think this split will be faster than normal split from std if
my file that i work with will be big?
2022-11-17 15:39 GMT+01:00, Andrew Gallant ***@***.***>:
… A `&str` can be converted to `&[u8]` via
[`str::as_bytes`](https://doc.rust-lang.org/std/primitive.str.html#method.as_bytes).
--
Reply to this email directly or view it on GitHub:
#928 (reply in thread)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
hello, I want to ask you if aho_corasick supports processing non utf8
bytes if i am using replace_all_with method?
2022-11-17 16:08 GMT+01:00, Andrew Gallant ***@***.***>:
… Maybe. Benchmark it.
--
Reply to this email directly or view it on GitHub:
#928 (reply in thread)
You are receiving this because you authored the thread.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
I think you probably want to use
bytes::Regex
for this.As for a specific example, I don't really understand what you're saying. Could you please provide some input and the desired output?