Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: rework autoSelectFamily implementation #46587

Merged

Conversation

ShogunPanda
Copy link
Contributor

@ShogunPanda ShogunPanda commented Feb 9, 2023

This PR reworks the implementation of autoSelectFamily in net.connect to better swap internal TCPWrap when the connection is successful. It also fixes several issues with TLS.

It also add a global getter and setter setDefaultAutoSelectFamilyAttemptTimeout and getDefaultAutoSelectFamilyAttemptTimeout to allow global customization of the entire family autoselection algorithm.

Fixes #46669.
Fixes #46679.

@ShogunPanda ShogunPanda added the semver-major PRs that contain breaking changes and should be released in the next major version. label Feb 9, 2023
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. net Issues and PRs related to the net subsystem. labels Feb 9, 2023
@mcollina mcollina self-requested a review February 9, 2023 15:30
src/node_options.cc Outdated Show resolved Hide resolved
Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance impact of this pull request?

doc/api/net.md Outdated Show resolved Hide resolved
doc/api/net.md Outdated Show resolved Hide resolved
@mcollina mcollina added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Feb 11, 2023
@mcollina
Copy link
Member

Putting on the tsc-agenda for visibility.

@ShogunPanda
Copy link
Contributor Author

I'll review comments on Mon.
For now I'm just anticipating that I needed to make a substantial change in the implementation to support this switch.

This will involve a C++ change that I tried but I'm not sure if it has side effects. How can I ping for help on that?

@anonrig
Copy link
Member

anonrig commented Feb 11, 2023

cc @nodejs/cpp-reviewers

@ShogunPanda
Copy link
Contributor Author

ShogunPanda commented Feb 11, 2023

Thanks @anonrig!

I'll push my changes later tonight so they can start taking a look while I fix the last failing test.

Changes pushed.

@nodejs/cpp-reviewers @nodejs/net To simplify and stabilize the implementation I introduce a new "Reinitialize" method on the TCP wrap to create a new uv_tcp_t. This is better that then previous implementation I had that swapped TCPWrap after connecting since it has less side effects. But do you think the C++ is valid? Do you think it can have any side effect?

@mcollina
Copy link
Member

@ShogunPanda I would split the C++ change from flipping the default into two different PRs, I think the C++ changes do not need to be semver-major.

@ShogunPanda
Copy link
Contributor Author

You mean C++ changes first and then I flip the switch?
Consider that I discovered the flaw in the current implementation (which broke lot of tests) when enabling the feature.

@mcollina
Copy link
Member

I think we should backport the fix, but not the switch of the default. Right now they would both be set as semver-major.

@ShogunPanda
Copy link
Contributor Author

ShogunPanda commented Feb 13, 2023

As @mcollina suggested I've reworked this PR to only fix the autoSelectFamily implementation.
This is no longer semver-major I guess and can be back ported.

@nodejs/cpp-reviewers @addaleax @jasnell Do you mind reviewing the C++ changes and tell me if there is anything wrong?

@ShogunPanda ShogunPanda removed the semver-major PRs that contain breaking changes and should be released in the next major version. label Feb 13, 2023
@ShogunPanda ShogunPanda changed the title net: enable autoSelectFamily by default net: rework autoSelectFamily implementation Feb 13, 2023
src/tcp_wrap.cc Outdated Show resolved Hide resolved
src/tcp_wrap.cc Outdated
&wrap, args.Holder(), args.GetReturnValue().Set(UV_EBADF));

Environment* env = wrap->env();
int r = uv_tcp_init(env->event_loop(), &wrap->handle_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have some sort of check like this?

Suggested change
int r = uv_tcp_init(env->event_loop(), &wrap->handle_);
CHECK(!IsAlive(this));
int r = uv_tcp_init(env->event_loop(), &wrap->handle_);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably. I'll add this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax I tried, and it always fails, both in TCPWrap::TCPWrap or in TCPWrap::Reinitialize. So I guess this is not needed. Or am I doing the wrong check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that the handle is not closed at this point? I don’t think it’s safe to call uv_..._init() functions on open handles? @bnoordhuis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handle just received uv_tcp_init but never any open/connect methods. Does it still mean they are considered opened?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer to the uv_tcp_t object needs to stay alive until uv_close() is done with it. At the very least you'd need to pass a callback that frees the memory afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine. I can definetely do that.

Memory problems aside, do you foresee any problems in swapping the handler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in theory this works, but it’s going to be a bit more complex than the code above.

For example, you’ll need to update all ConnectionWrap subclasses to have a pointer handle member rather than a plain member, and you’re going to have to use Environment::CloseHandle() rather than uv_close() so that the close callback gets tracked properly.

My gut feeling would be that at that point, it probably makes more sense to look into how to handle this more properly on the JS layer rather than trying to re-use TCPWrap instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you are probably right, I'm looking for too much troubles. I'll see how to fix it on the JS side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax I finally solved (hopefully all) the issues without touching the C++ and swapping handles.
Thanks for not having me kill with my own hands :)

@tniessen
Copy link
Member

The ASan failures are relevant, in particular, heap-use-after-free.

targos pushed a commit that referenced this pull request Mar 13, 2023
PR-URL: #46587
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@ShogunPanda ShogunPanda removed the dont-land-on-v18.x PRs that should not land on the v18.x-staging branch and should not be released in v18.x. label Jul 27, 2023
@ShogunPanda
Copy link
Contributor Author

@juanarbol After doublecheck with @mhdawson we agreed that this PR should indeed land on 18 as it is the foundation to fix the initially faulty network family selection feature.

@mhdawson
Copy link
Member

I have a backported version of this. @juanarbol should I create a separate backport PR or would it makes sense to be part of #48275? The backport I have depends on 48275 landing first.

I also want to backport #47029 and
#48189 as landing 48275, 46587, 47029 and then 49189 allows 48796 to be landed on 18.x. 48796 has not landed yet but it fixes an issue I've been looking at.

@mhdawson
Copy link
Member

mhdawson commented Jul 27, 2023

@ShogunPanda mentioned that we might also want to include 48464 on the ones backported. I'll add that to my list above.

@juanarbol
Copy link
Member

I have a backported version of this. @juanarbol should I create a separate backport PR or would it makes sense to be part of #48275? The backport I have depends on 48275 landing first.

You can push into my PR if you need to, I'm ok with whatever you prefer to do

@juanarbol
Copy link
Member

@juanarbol That's wrong, they are unrelated.

When I tried to land this, I found that this was using at least one change made by #46790

@ShogunPanda
Copy link
Contributor Author

Yeah, but I think @mhdawson resolved those conflicts, isn't it?

@mhdawson
Copy link
Member

mhdawson commented Aug 2, 2023

@juanarbol, @ShogunPanda correct I had a set of merged/tweaked PRs including this one. Thinking about it more I assume pushing to the existing PR does not makes sense I'll need additional backport PRs. Will start doing that.

mhdawson pushed a commit to mhdawson/io.js that referenced this pull request Aug 4, 2023
PR-URL: nodejs#46587
Backport-PR-URL:
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@mhdawson
Copy link
Member

mhdawson commented Aug 4, 2023

@juanarbol, @ShogunPanda, @ruyadorno I submitted this PR - #49016 to backport this PR as part of the backport needed to land #48969. It includes the commits from #48275 so it can either be landed after that PR lands or the commits could be merged into that PR for landing. @ruyadorno hoping as the releaser for the nex 18.x release you can decide what is best.

ruyadorno pushed a commit that referenced this pull request Aug 14, 2023
PR-URL: #46587
Backport-PR-URL: #49016
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@ruyadorno ruyadorno added backported-to-v18.x PRs backported to the v18.x-staging branch. backport-requested-v18.x PRs awaiting manual backport to the v18.x-staging branch. and removed needs-ci PRs that need a full CI run. backported-to-v18.x PRs backported to the v18.x-staging branch. labels Aug 14, 2023
mhdawson pushed a commit to mhdawson/io.js that referenced this pull request Aug 15, 2023
PR-URL: nodejs#46587
Backport-PR-URL:
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@mhdawson
Copy link
Member

Backport to 18.x PR (second try) - #49183

@mhdawson mhdawson added the backport-open-v18.x Indicate that the PR has an open backport. label Aug 15, 2023
mhdawson pushed a commit to mhdawson/io.js that referenced this pull request Aug 15, 2023
PR-URL: nodejs#46587
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
ruyadorno pushed a commit that referenced this pull request Aug 17, 2023
PR-URL: #46587
Backport-PR-URL: #49183
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@ruyadorno ruyadorno added backported-to-v18.x PRs backported to the v18.x-staging branch. and removed backport-requested-v18.x PRs awaiting manual backport to the v18.x-staging branch. backport-open-v18.x Indicate that the PR has an open backport. labels Aug 17, 2023
@ruyadorno ruyadorno mentioned this pull request Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported-to-v18.x PRs backported to the v18.x-staging branch. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. net Issues and PRs related to the net subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

autoSelectFamily: true breaks TLS Error [ERR_INTERNAL_ASSERTION] with autoSelectFamily enabled