Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Double the performance of Snabb NFV #710

Open
6 tasks
lukego opened this issue Jan 17, 2016 · 2 comments
Open
6 tasks

Idea: Double the performance of Snabb NFV #710

lukego opened this issue Jan 17, 2016 · 2 comments
Labels

Comments

@lukego
Copy link
Member

lukego commented Jan 17, 2016

Here is a fun idea to play with in the background: How about doubling the performance of Snabb NFV?

Why

Our original performance target was to handle 10 Gbps per core for realistic ISP workloads based on a reference processor that Intel use in their NFV case studies. Such performance was unheard of with Virtio-net at that time and we had to write the QEMU code to make it possible. There were few if any virtual machines available that were optimized to keep up with millions of packets per second on their Virtio-net interfaces.

Time is moving forwards though. High-speed Virtio-net is an established idea now. Optimized applications like Igalia's snabb-lwaftr can use the available capacity and more. The clock speeds on Intel's latest high-end CPUs are lower than the previous generation.

Doubling the performance of Snabb NFV would be valuable. On the one hand it would enable efficient deployments with only one reserved core per 20G of traffic. On the other hand it would provide a performance buffer for maintaining 10G per core performance in the presence of complications like slow processors, NUMA mismatches, and performance-affecting bugs.

How

Snabb NFV processing cost is split fairly evenly between:

  • Virtio-net vring processing.
  • Data copies to/from guest memory (Virtio-net "DMA").

and also for client/server applications like iperf/apache/postgresql/etc:

  • Checksum computations to offload work from the guest.

So one optimization strategy would be to double the performance of each item on that list.

Here is an initial sketch of how that might play out:

  • Virtio-net processing could be optimized by reviewing and improving the JIT'ed inner loop code.
  • Data copies could be optimized with a special purpose assembler routine (sese Packet copies: Expensive or cheap? #648).
  • Checksum computations are bound by the SIMD performance of the CPU and that is scheduled to double in the near future with CPUs supporting AVX512 instructions.

Progress

Here is a placeholder list to see how we are doing:

  • Double virtio-net vring processing performance
  • Double virtio-net copy performance.
  • Double SIMD checksum performance.

And interesting performance milestones:

  • Achieve 10 Gbps full-duplex packet forwarding with 128-byte packets.
  • Achieve 20 Gbps full-duplex packet forwarding with 256-byte packets.
  • Achieve 20 Gbps full-duplex TCP performance with iperf.
@kbara
Copy link
Contributor

kbara commented Jan 18, 2016

That would be amazing.

@mwiget
Copy link
Contributor

mwiget commented Jan 25, 2016

I'm currently using VMDq to create two logical interfaces connecting to either side if Igalia's snabb-lwaftr. Initially just for testing over a single loopback, but its actually very useful to place the IPv4 and/or IPv6 side into a VLAN or run the app "on a stick" using virtual MAC addresses.
Taking this idea a step further, why not launch multiple lwaftr apps via VMDq over the same interface and optionally in the same VLAN (or untagged). If you read this far, you probably shout that this is exactly what snabbnfv does. And yes, it does, but wouldn't it be great to have separate snabb processes sharing a physical port via VMDq? That would give us/me an immediate boost by running multiple instances of lwaftr and hit them by flows spread across them. @wingo already solved the issue of sharing binding tables with many processes via shared memory mapped file. Suddenly just doubling the performance of Snabb NFV feels so yesterday ;-).

takikawa pushed a commit to takikawa/snabb that referenced this issue Jan 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants