-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch/optimize(bpf): improve lan hijack datapath performance #466
Conversation
测试通过 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧪 Since the PR has been fully tested, please consider merging it.
❌ Your branch is currently out-of-sync to main. No worry, I will fix it for you. |
a29b998
to
d12323b
Compare
❌ Your branch is currently out-of-sync to main. No worry, I will fix it for you. |
Previously we parsed skb->data for twice: wan_egress/lan_ingress and dae0peer_ingress. This is because the limit of bpf_sk_assign: we have to call it within the netns where the socket is. This patch manages to parse skb->data only once at wan_egress/lan_ingress, where we leave a value in skb->cb[1] to tell dae0peer_ingress: 1. if skb->cb[1] == TCP, then it's a new TCP conn, assign skb to TCP listener; 2. if skb->cb[1] == UDP, then it's a UDP, assign skb to UDP listener; 3. else it's an establised TCP conn, stack can take care of socket lookup;
8c54f88
to
769d2a4
Compare
Tested in the following environment, works very well.
|
9db0e34
to
026f23b
Compare
Because apt.k8s.io no longer exists: https://kubernetes.io/blog/2023/08/31/legacy-package-repository-deprecation/
026f23b
to
8cc3e8a
Compare
Thank all folks who keep testing this PR, 5badabf is the last low-hanging fruit whose temptation I can't resist. Hope this small patch doesn't break anything 🤞 The lpc2020 had a talk introducing this |
After binding docker0 to the LAN and testing 5badabf, everything works perfectly. There are no issues with direct connection diversion. Well done.
|
使用最新 CI build在以下环境测试成功:
|
Benchmark (lan only)1. Env: Linux 6.6.17 KVM, 4 cores, 12G memory.2. SetupRun two docker containers, one has dae inside, the other has v2ray. It's almost the same as dae's github action test: just see two containers as two nodes. I am using sockperf:
3. TCPdae-0.4.0: avg-latency=37.310 (std-dev=7.352) avg-latency improves by 1.3%. This seems not too much, because the testing environment is clean and free from netfilter. After adding a simple iptables rule on the dae node:
dae-0.4.0 will perform worse, sometimes avg-latency could go as high as 38+, while dae-next (this pr) won't be affected at all because of stack bypass implementation. In the case, it's 3.1% improvement. 4. UDPThe normal UDP test result is: dae-0.4.0: avg-latency=58.275 (std-dev=50.721) 4% boost. However, it is also known that dae-0.4.0 uses encapsulation to avoid port conflict if there is a process already listening on 53, which damages performance badly. When that fallback takes place, dae-0.4.0's avg-latency will drop to 60.412 (std-dev=47.764), and dae-next has 7%+ better result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your groundbreaking work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧪 Since the PR has been fully tested, please consider merging it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant code!
Background
这个 PR 引入了三项针对 lan 的性能优化。先回顾 datapath:
优化 1:a 和 b 处的 bpf 程序都解析了一遍二三四层的包头,其实没有必要解析两次,在 a 出解析完了之后可以通过 skb->cb 把 b 处需要知道的信息夹带过去。
优化 2:b 处的 peer_ingress bpf 没有必要对 established tcp 调用 bpf_skc_lookup 查询 socket,因为内核本身就可以完成 socket lookup。在开启 tcp_early_demux 的情况下还可以避免路由决策直接做 local delivery。
优化 3:a 处的 lan_ingress 可以调用 bpf_redirect_peer 直接重定向给 netns 内部的 peer,避免 enqueue_to_backlog 造成的性能影响。
Background
This PR introduces 3 performance optimizations. First, let's review the datapath:
Optimization 1: Both the BPF programs at points a and b have parsed the packet headers up to layers two, three, and four. It's unnecessary to parse them twice. After parsing at point a, the information needed at point b can be passed using skb->cb.
Optimization 2: The peer_ingress BPF at point b doesn't need to perform socket lookup for established TCP connections using bpf_skc_lookup because the kernel itself can handle socket lookup. With tcp_early_demux enabled, it can also avoid routing decisions and perform local delivery directly.
Optimization 3: The lan_ingerss at point a redirects the skb from wan0 to dae0, which then goes through netns to reach the peer. This step can be simplified using bpf_redirect_peer: redirect the skb directly from lan0 to the peer inside the netns, avoiding performance impact from enqueue_to_backlog.
Recommendation: Review by commit.
Checklist
Full Changelogs
Issue Reference
Closes #[issue number]
Test Result