You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a fun idea to play with in the background: How about doubling the performance of Snabb NFV?
Why
Our original performance target was to handle 10 Gbps per core for realistic ISP workloads based on a reference processor that Intel use in their NFV case studies. Such performance was unheard of with Virtio-net at that time and we had to write the QEMU code to make it possible. There were few if any virtual machines available that were optimized to keep up with millions of packets per second on their Virtio-net interfaces.
Time is moving forwards though. High-speed Virtio-net is an established idea now. Optimized applications like Igalia's snabb-lwaftr can use the available capacity and more. The clock speeds on Intel's latest high-end CPUs are lower than the previous generation.
Doubling the performance of Snabb NFV would be valuable. On the one hand it would enable efficient deployments with only one reserved core per 20G of traffic. On the other hand it would provide a performance buffer for maintaining 10G per core performance in the presence of complications like slow processors, NUMA mismatches, and performance-affecting bugs.
How
Snabb NFV processing cost is split fairly evenly between:
Virtio-net vring processing.
Data copies to/from guest memory (Virtio-net "DMA").
and also for client/server applications like iperf/apache/postgresql/etc:
Checksum computations to offload work from the guest.
So one optimization strategy would be to double the performance of each item on that list.
Here is an initial sketch of how that might play out:
Virtio-net processing could be optimized by reviewing and improving the JIT'ed inner loop code.
Checksum computations are bound by the SIMD performance of the CPU and that is scheduled to double in the near future with CPUs supporting AVX512 instructions.
Progress
Here is a placeholder list to see how we are doing:
Double virtio-net vring processing performance
Double virtio-net copy performance.
Double SIMD checksum performance.
And interesting performance milestones:
Achieve 10 Gbps full-duplex packet forwarding with 128-byte packets.
Achieve 20 Gbps full-duplex packet forwarding with 256-byte packets.
Achieve 20 Gbps full-duplex TCP performance with iperf.
The text was updated successfully, but these errors were encountered:
I'm currently using VMDq to create two logical interfaces connecting to either side if Igalia's snabb-lwaftr. Initially just for testing over a single loopback, but its actually very useful to place the IPv4 and/or IPv6 side into a VLAN or run the app "on a stick" using virtual MAC addresses.
Taking this idea a step further, why not launch multiple lwaftr apps via VMDq over the same interface and optionally in the same VLAN (or untagged). If you read this far, you probably shout that this is exactly what snabbnfv does. And yes, it does, but wouldn't it be great to have separate snabb processes sharing a physical port via VMDq? That would give us/me an immediate boost by running multiple instances of lwaftr and hit them by flows spread across them. @wingo already solved the issue of sharing binding tables with many processes via shared memory mapped file. Suddenly just doubling the performance of Snabb NFV feels so yesterday ;-).
Here is a fun idea to play with in the background: How about doubling the performance of Snabb NFV?
Why
Our original performance target was to handle 10 Gbps per core for realistic ISP workloads based on a reference processor that Intel use in their NFV case studies. Such performance was unheard of with Virtio-net at that time and we had to write the QEMU code to make it possible. There were few if any virtual machines available that were optimized to keep up with millions of packets per second on their Virtio-net interfaces.
Time is moving forwards though. High-speed Virtio-net is an established idea now. Optimized applications like Igalia's
snabb-lwaftr
can use the available capacity and more. The clock speeds on Intel's latest high-end CPUs are lower than the previous generation.Doubling the performance of Snabb NFV would be valuable. On the one hand it would enable efficient deployments with only one reserved core per 20G of traffic. On the other hand it would provide a performance buffer for maintaining 10G per core performance in the presence of complications like slow processors, NUMA mismatches, and performance-affecting bugs.
How
Snabb NFV processing cost is split fairly evenly between:
and also for client/server applications like iperf/apache/postgresql/etc:
So one optimization strategy would be to double the performance of each item on that list.
Here is an initial sketch of how that might play out:
Progress
Here is a placeholder list to see how we are doing:
And interesting performance milestones:
The text was updated successfully, but these errors were encountered: