Idea: Double the performance of Snabb NFV #710

lukego · 2016-01-17T05:47:29Z

Here is a fun idea to play with in the background: How about doubling the performance of Snabb NFV?

Why

Our original performance target was to handle 10 Gbps per core for realistic ISP workloads based on a reference processor that Intel use in their NFV case studies. Such performance was unheard of with Virtio-net at that time and we had to write the QEMU code to make it possible. There were few if any virtual machines available that were optimized to keep up with millions of packets per second on their Virtio-net interfaces.

Time is moving forwards though. High-speed Virtio-net is an established idea now. Optimized applications like Igalia's snabb-lwaftr can use the available capacity and more. The clock speeds on Intel's latest high-end CPUs are lower than the previous generation.

Doubling the performance of Snabb NFV would be valuable. On the one hand it would enable efficient deployments with only one reserved core per 20G of traffic. On the other hand it would provide a performance buffer for maintaining 10G per core performance in the presence of complications like slow processors, NUMA mismatches, and performance-affecting bugs.

How

Snabb NFV processing cost is split fairly evenly between:

Virtio-net vring processing.
Data copies to/from guest memory (Virtio-net "DMA").

and also for client/server applications like iperf/apache/postgresql/etc:

Checksum computations to offload work from the guest.

So one optimization strategy would be to double the performance of each item on that list.

Here is an initial sketch of how that might play out:

Virtio-net processing could be optimized by reviewing and improving the JIT'ed inner loop code.
Data copies could be optimized with a special purpose assembler routine (sese Packet copies: Expensive or cheap? #648).
Checksum computations are bound by the SIMD performance of the CPU and that is scheduled to double in the near future with CPUs supporting AVX512 instructions.

Progress

Here is a placeholder list to see how we are doing:

Double virtio-net vring processing performance
Double virtio-net copy performance.
Double SIMD checksum performance.

And interesting performance milestones:

Achieve 10 Gbps full-duplex packet forwarding with 128-byte packets.
Achieve 20 Gbps full-duplex packet forwarding with 256-byte packets.
Achieve 20 Gbps full-duplex TCP performance with iperf.

The text was updated successfully, but these errors were encountered:

kbara · 2016-01-18T12:04:56Z

That would be amazing.

mwiget · 2016-01-25T14:55:15Z

I'm currently using VMDq to create two logical interfaces connecting to either side if Igalia's snabb-lwaftr. Initially just for testing over a single loopback, but its actually very useful to place the IPv4 and/or IPv6 side into a VLAN or run the app "on a stick" using virtual MAC addresses.
Taking this idea a step further, why not launch multiple lwaftr apps via VMDq over the same interface and optionally in the same VLAN (or untagged). If you read this far, you probably shout that this is exactly what snabbnfv does. And yes, it does, but wouldn't it be great to have separate snabb processes sharing a physical port via VMDq? That would give us/me an immediate boost by running multiple instances of lwaftr and hit them by flows spread across them. @wingo already solved the issue of sharing binding tables with many processes via shared memory mapped file. Suddenly just doubling the performance of Snabb NFV feels so yesterday ;-).

Fixing the query code, redux

lukego added the idea label Jan 17, 2016

lukego mentioned this issue Jan 17, 2016

[draft] lib.blit: Introduce API for optimized "blitter" for Virtio-net DMA #711

Closed

lukego mentioned this issue Jan 19, 2016

Optimized "blitter" routine written in assembler [wip] #719

Open

lukego mentioned this issue Feb 2, 2016

Cache coherence and Virtio-net performance #735

Open

takikawa pushed a commit to takikawa/snabb that referenced this issue Jan 16, 2017

Merge pull request snabbco#710 from Igalia/new_fixquery

13cfede

Fixing the query code, redux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Double the performance of Snabb NFV #710

Idea: Double the performance of Snabb NFV #710

lukego commented Jan 17, 2016

kbara commented Jan 18, 2016

mwiget commented Jan 25, 2016

Idea: Double the performance of Snabb NFV #710

Idea: Double the performance of Snabb NFV #710

Comments

lukego commented Jan 17, 2016

Why

How

Progress

kbara commented Jan 18, 2016

mwiget commented Jan 25, 2016