-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Node::add_child
validation
#75760
Optimize Node::add_child
validation
#75760
Conversation
uint32_t cn_length = cn.length(); | ||
uint32_t c_chars = String::num_characters(c); | ||
uint32_t len = 2 + cn_length + c_chars; | ||
char32_t *str = (char32_t *)alloca(sizeof(char32_t) * (len + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This buffer isn't deleted (the string constructor doesn't take ownership of it)
Also would it make sense to use a thread static shared buffer of like size 64 or so and only allocate this temporary buffer if that is too small?
Alternatively make this buffer a CowData<char32_t>
and add a String
constructor that takes ownership of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alloca() allocates on the stack. It's the most efficient way of allocation for this kind of usage and it does not need to be freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you can just simplest solution that is already implemented; nvm then
p_int = -p_int; | ||
} | ||
} | ||
while (p_int >= 10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (p_int < 10) return r+0;
if (p_int < 100) return r+1;
// ...
may be faster because it doesn't do divisions, but I didn't test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably faster, but the code is a mess, specially because this is int64 :P Still, in this context the code should be ok as-is, you won't get a really meaningful performance gain.
fc071b5
to
0081e5b
Compare
0081e5b
to
f4be7c7
Compare
Adding 10k nodes is almost twice as fast.
f4be7c7
to
223ce4f
Compare
Thanks! |
I know this was already merged. I want to put a note for the future that this optimization also changed scene import behavior between Godot 4.0 and Godot 4.1: As discovered in issue #78881, this changed the rules so that invalid characters are replaced with an underscore Neither this PR nor the commit made a note of this change in behavior. So that's why I'm writing this comment, for folks like you coming from a google search. Prior to this commit, an object named |
Node::add_child
validation
See #75760 (comment) and godotengine/godot-proposals#6225 (linked). |
Adding 10k nodes is almost twice as fast.
Code I used to test:
Closes godotengine/godot-proposals#6225