-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File: Remove \r
in get_as_text()
to keep standardized Unix format
#63717
Conversation
This fixes a compatibility breakage from godotengine#63481. See discussion in godotengine#63434.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead augment the String::parse_utf8(...)
with an additional (optional) parameter that allows us to remove that character from the generated string.
This will avoid performing another traversal and copy of the same string, which may not be an issue for smaller strings, but may affect performance for larger strings.
Also as discussed in the conversation, we want this change to apply only for 3.x
in order to avoid breaking compatibility.
As I mentioned in the conversation, for master
and onward, we should return the raw file as is.
Well I prefer fixing the compatibility breakage in The For the same reason, I wouldn't change an API as core as I don't think the cost of |
I did a test on a very heavy file with a
Measuring conversion time with this script: var f1 = File.new()
f1.open("editor_translations.gen.h", File.READ)
var start1 = Time.get_ticks_usec()
f1.get_as_text()
var end1 = Time.get_ticks_usec()
print("Unix file: %f s" % ((end1 - start1) / 1e6))
var f2 = File.new()
f2.open("editor_translations.gen.h_dos", File.READ)
var start2 = Time.get_ticks_usec()
f2.get_as_text()
var end2 = Time.get_ticks_usec()
print("DOS file: %f s" % ((end2 - start2) / 1e6))
So it does make a significant difference on this extreme case. It's still much better than before #63481 especially when no replace is actually needed. I'll give a try to implementing this in Edit: Gave it a (hacky) try and that's indeed way faster: diff --git a/core/string/ustring.cpp b/core/string/ustring.cpp
index beefe54faf..252460af14 100644
--- a/core/string/ustring.cpp
+++ b/core/string/ustring.cpp
@@ -1753,6 +1753,11 @@ Error String::parse_utf8(const char *p_utf8, int p_len) {
uint8_t c = *p_utf8 >= 0 ? *p_utf8 : uint8_t(256 + *p_utf8);
if (skip == 0) {
+ if (c == '\r') {
+ cstr_size--;
+ p_utf8++;
+ continue;
+ }
/* Determine the number of characters in sequence */
if ((c & 0x80) == 0) {
*(dst++) = c;
But I don't know if we want to do this in something as low level as |
This should be fine if we put the logic behind an optional parameter which defaults to the current behavior. |
If it's an optional parameter, why not, it should be fine. But it's probably should be done in two places in |
Superseded by #63733. |
This was removed in godotengine#63481, and we confirmed that it's better like this, but we add back the possibility to strip CR as an option, to optionally restore the previous behavior. For performance this is done directly in `String::parse_utf8`. Also fixes Android `FileAccess::get_line()` as this one _should_ strip CR. Supersedes godotengine#63717.
This was removed in godotengine#63481, and we confirmed that it's better like this, but we add back the possibility to strip CR as an option, to optionally restore the previous behavior. For performance this is done directly in `String::parse_utf8`. Also fixes Android `FileAccess::get_line()` as this one _should_ strip CR. Supersedes godotengine#63717. (cherry picked from commit 1418f97)
This was removed in godotengine#63481, and we confirmed that it's better like this, but we add back the possibility to strip CR as an option, to optionally restore the previous behavior. For performance this is done directly in `String::parse_utf8`. Also fixes Android `FileAccess::get_line()` as this one _should_ strip CR. Supersedes godotengine#63717. (cherry picked from commit 1418f97)
This fixes a compatibility breakage from #63481.
See discussion in #63434.