Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: move getValidMode to c++ for better performance #49970

Closed
wants to merge 2 commits into from

Conversation

andremralves
Copy link
Contributor

                                                     confidence improvement accuracy (*)    (**)   (***)
fs/bench-copyFileSync.js n=10000 type='invalid mode'        ***     69.61 %      ±10.49% ±14.01% ±18.33%
fs/bench-copyFileSync.js n=10000 type='invalid path'                 1.58 %       ±6.12%  ±8.14% ±10.59%
fs/bench-copyFileSync.js n=10000 type='valid'                        1.90 %      ±10.90% ±14.50% ±18.88%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 3 comparisons, you can thus expect the following amount of false-positive results:
  0.15 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.03 false positives, when considering a   1% risk acceptance (**, ***),
  0.00 false positives, when considering a 0.1% risk acceptance (***)
                                                         confidence improvement accuracy (*)    (**)   (***)
fs/bench-accessSync.js n=100000 type='existing'                         -2.88 %       ±6.17%  ±8.22% ±10.72%
fs/bench-accessSync.js n=100000 type='invalid-mode'             ***     52.14 %      ±10.53% ±14.03% ±18.30%
fs/bench-accessSync.js n=100000 type='non-existing'                     -0.87 %       ±5.86%  ±7.80% ±10.17%
fs/bench-accessSync.js n=100000 type='non-flat-existing'                 0.83 %       ±6.60%  ±8.78% ±11.43%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 4 comparisons, you can thus expect the following amount of false-positive results:
  0.20 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.04 false positives, when considering a   1% risk acceptance (**, ***),
  0.00 false positives, when considering a 0.1% risk acceptance (***)

Ref: nodejs/performance#111

The improvements are noticeable for cases where an error related to mode is thrown.

Thanks to @anonrig for the idea.

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Sep 30, 2023
Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Amazing work!

src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
@anonrig anonrig added the performance Issues and PRs related to the performance of Node.js. label Sep 30, 2023
src/node_file.cc Outdated Show resolved Hide resolved
lib/internal/fs/sync.js Outdated Show resolved Hide resolved
@anonrig
Copy link
Member

anonrig commented Sep 30, 2023

fs/sync.js is merged into fs.js file. Additionally, commit queue doesn't work with github merge commits. Can you manually squash your commits into 1?

@andremralves
Copy link
Contributor Author

Rebased and squashed 😄

@andremralves andremralves force-pushed the getValidMode-to-cpp branch 2 times, most recently from c3db742 to 5a5e3f6 Compare September 30, 2023 03:43
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Show resolved Hide resolved
@anonrig
Copy link
Member

anonrig commented Sep 30, 2023

Build fails for macOS

@andremralves
Copy link
Contributor Author

andremralves commented Sep 30, 2023

It seems that the macros F_Ok, W_OK, R_OK and X_OK aren't defined in macOS.

Maybe I can add an #ifndef and define those macros with 0?

@bnoordhuis
Copy link
Member

bnoordhuis commented Sep 30, 2023

Try this:

// F_OK etc. constants
#ifdef _WIN32
#include "uv.h"
#else
#include <unistd.h>
#endif

edit: <fcntl.h> probably already pulls them in on Windows

@bnoordhuis
Copy link
Member

Still needs unistd.h on macos.

@andremralves andremralves force-pushed the getValidMode-to-cpp branch 2 times, most recently from 1b6b918 to 7074cd9 Compare October 1, 2023 16:29
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.h Outdated Show resolved Hide resolved
@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 2, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 2, 2023
@nodejs-github-bot
Copy link
Collaborator

@anonrig
Copy link
Member

anonrig commented Oct 6, 2023

@tniessen The goal is to remove JS implementation of fs operations to open up room for V8 fast api (by removing functions that eventually unflattens the string). I personally think this is the correct path since error paths on filesystem is pretty common (although I can't back this claim with a scientific proof at this time)

@tniessen
Copy link
Member

tniessen commented Oct 6, 2023

error paths on filesystem is pretty common

While that might be true for some cases, passing an invalid mode argument to functions such as access() seems like an unlikely edge cases whose performance is irrelevant.

The goal is to remove JS implementation of fs operations to open up room for V8 fast api

That's a reasonable goal, I suppose, but the commit message sounds as if this change itself is motivated by an immediate performance improvement. That's all I was pointing out :)

More importantly, this is a breaking change.

@anonrig
Copy link
Member

anonrig commented Oct 6, 2023

Yeah, the commit message can be improved. I missed the part where why this is a breaking change. @tniessen

@tniessen
Copy link
Member

tniessen commented Oct 6, 2023

Unless I'm missing something, some functions now don't throw synchronously anymore but instead report the validation error to the callback, which seems unusual for argument validation. I am not sure if that's desirable (or if we treat these things as semver-major).

);
const src = './test/fixtures/copy/kitchen-sink';
const dest = nextdir();
cp(src, dest, mustNotMutateObjectDeep({ recursive: true, mode: -1 }), (err) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback must be wrapped in mustCall(). But I am really not sure if argument validation errors should be async, it seems very unusual.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...the THROW_ macros being run before the AsyncDestCall means this should actually be thrown synchronously, so it wouldn't get to the callback at all. Seems to me like it should be throwing from the cp(...) call itself though, so I'm not sure how it could be getting to the rest of the test if that's the case. 🤔

@andremralves
Copy link
Contributor Author

Unless I'm missing something, some functions now don't throw synchronously anymore

Yes, is there any way to keep it synchronous while calling the validation function on the c++ land?

Qard
Qard previously approved these changes Oct 6, 2023
@Qard
Copy link
Member

Qard commented Oct 6, 2023

Wouldn't the use of the THROW_ macros mean it will raise the error immediately? (Or at least as "immediately" as re-entrancy after the early return...)

I'm not seeing how any of these errors could become async from this. 🤔

@Qard Qard dismissed their stale review October 6, 2023 23:56

Premature approval. Looks like the tests need some work. 🤔

@andremralves
Copy link
Contributor Author

andremralves commented Oct 7, 2023

I made some investigation and it turns out that fs.access and fs.copyFile throws synchronously as expected. The problem is with fs.cp that internally calls copyFile inside an async function here:

async function _copyFile(srcStat, src, dest, opts) {
await copyFile(src, dest, opts.mode);
if (opts.preserveTimestamps) {
return handleTimestampsAndMode(srcStat.mode, src, dest);
}
return setDestMode(dest, srcStat.mode);
}

So when the copyFile binding is called, it's already running asynchronously or am I missing something?

I don't know how could we handle this case. One option is to keep the validation of fs.cp in the js land, but we would have some code duplication. Does anyone have a better idea?

await assert.rejects(
fs.promises.cp('a', 'b', {
fs.promises.cp(src, dest, mustNotMutateObjectDeep({
recursive: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why recursive is needed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, the test throws ERR_FS_EISDIR instead of ERR_OUT_OF_RANGE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a breaking change, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it now, I think so because if someone was checking for an mode error in fs.cp synchronously, this behavior would change to async. We could keep using getValidMode on the JS side just for fs.cp which is the only problematic function or don't make these changes at all. Or maybe there is another way, idk.

@RafaelGSS
Copy link
Member

@nodejs-github-bot
Copy link
Collaborator

@RafaelGSS RafaelGSS added the semver-major PRs that contain breaking changes and should be released in the next major version. label Oct 9, 2023
@RafaelGSS
Copy link
Member

I flagged it as semver-major since it's changing the API. Tbh, I'm not sure if this is the desirable approach.

@anonrig
Copy link
Member

anonrig commented Oct 14, 2023

I flagged it as semver-major since it's changing the API. Tbh, I'm not sure if this is the desirable approach.

Eventually we'll move everything to C++ side to enable using V8 APIs. IMHO, This is a step in the correct direction.

@nodejs-github-bot
Copy link
Collaborator

@tniessen
Copy link
Member

Eventually we'll move everything to C++ side to enable using V8 APIs. IMHO, This is a step in the correct direction.

IMHO making parameter validation async is not. Performance improvements generally should not result in breaking changes, unless the design (and not just the implementation) changes.

@Qard
Copy link
Member

Qard commented Oct 16, 2023

Yeah, input validation should always remain sync. I think situations like this where one function delegates to others internally we should have an internal implementation without the validation that can be shared so we can raise all validation to before it runs any of the async code. 🤔

@anonrig
Copy link
Member

anonrig commented Dec 6, 2023

Hey @andremralves, are you working on this? If not, I'd like to complete and land this with a little bit different approach.

@andremralves
Copy link
Contributor Author

Hey @andremralves, are you working on this? If not, I'd like to complete and land this with a little bit different approach.

I'm not. I got a little bit stuck on this issue and was waiting for new ideas. Feel free to continue this work. 😁

@joyeecheung
Copy link
Member

Regarding the enabling fast API - one thing that I've noticed before in another PR is that once the fast API fallback to an exception-throwing slow path it could degrade the performance seriously so that it could be slower than not using fast API at all. I think some verification needs to be done to disprove this before we use fast API usage as an argument to throw errors in C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants