Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build with Musl libc #682

Merged
merged 1 commit into from
Jan 5, 2016
Merged

Fix build with Musl libc #682

merged 1 commit into from
Jan 5, 2016

Conversation

justincormack
Copy link
Contributor

Minor portability fixes that let Snabb build under Musl libc, tested on Alpine Linux.

@lukego
Copy link
Member

lukego commented Dec 28, 2015

Thinking about the different ways we could approach musl support in Snabb Switch:

  1. Import musl as a subtree and make the master branch always build a statically-linked executable.
  2. Support musl as our second first-class platform with full test coverage (including performance regression tests compared with glibc).
  3. Support musl as a second-class platform with only basic test coverage e.g. CI uses docker to check that Snabb Switch compiles on Alpine Linux.
  4. Just merge patches that look like they would help musl but without doing any upstream testing.

How about we celebrate the new year by being wild and crazy with option 1?

That would be to add musl as our fourth dependency (alongside luajit, ljsyscall, and pflua) and truly ship a statically linked executable that will run on any Linux/x86-64 machine.

The first step would be to send a PR that adds musl as a subtree and uses it for the build. This will probably trigger test failures e.g. I'm expecting the NFV application to have a performance regression because the inner loop depends on the libc memcpy to be optimized for packet copy workload (glibc memcpy is, musl memcpy is not). This probably should be resolved in our own code i.e. by having a suitable packet-copy routine of our own instead of depending on non-standard properties of the libc one.

I have had a quick look at musl. I appreciate the coding style i.e. really focusing on short and simple code. Compilation time is about 1 minute serially but only 5 seconds in parallel (measured on chur) so we would still hit our compile time targets if we assume make -j.

I believe there will be more issues e.g. that we can't use dlsym with a static link and this will break LuaJIT's FFI? This is something we would need to resolve before merge too.

Thoughts for/against? anybody willing to take the lead on such an integration e.g. to create a branch that compiles a static executable with musl imported as a git subtree?

@justincormack
Copy link
Contributor Author

A few thoughts

a. A standalone statically linked executable would be really nice, I really appreciate the ease of install of for example Go projects that do this now.
b. I would still recommend doing 4 first, ie merging this patch, as it is generally helpful, and makes it easier to experiment with how to get to 1.
c. What size and alignment for memcpy is most important? Guessing around 1k, maximally aligned would be typical. This is really a separate issue, maybe we should have our own memcpy code to be in control anyway, will look at performance.
d. I did do some work ages ago on dealing with the dlsym issue, by generating a dlsym function which looks like the normal one but just returns pointers to the static functions. See https://github.com/justincormack/ljsyscall/blob/master/examples/dl.c but it was only semi scripted as a proof of concept. I think this needs more thought in terms of a generic simple solution. Also in terms of generated code size, you lose some of the advantage of static linking if you simply link all symbols in, rather than just the ones you use, so it needs a bit of fine tuning. I think I might try a generic script to work with ljsyscall statically linked as a first pass on this.

So I think it might still be best to split this up into a few different activities even if aim is 1.

lukego added a commit to lukego/snabb that referenced this pull request Dec 30, 2015
@lukego
Copy link
Member

lukego commented Dec 30, 2015

Sounds like a plan! Merged this PR onto next.

Here is some data about the packet copy routine for us:

  1. Primarily copying ~ 32 - 1500 bytes that is in cache.
  2. Safe to round up copies to whole cache lines (either guarded with a predictable branch or actually guaranteed).
  3. May have nice alignment. (Can potentially ensure this for struct packet but less certain for virtio-net buffers allocated by VMs.)
  4. Caller will know whether src/dst overlap in memory and whether forwards or backwards copy is needed.

Question then is whether we should write a memcpy() or write one or more special purpose packetcopy() routines.

I have browsed memcpy implementations around the internet (dpdk, libc, musl, etc). The shortest path to a relatively simple and efficient one that matches our needs seems to be to take Agner Fog's design and remove the special cases that are not relevant for us. He seems to have struct a balance between performance and simplicity (well, I say that not having benchmarked it for our use case).

See also memcpy odyssey threads on snabb-devel and #648 (Packet copies: Expensive or cheap?).

@lukego
Copy link
Member

lukego commented Dec 30, 2015

Good news re: memcpy is that we do have CI coverage for this. A bad memcpy should impact the performance regression tests for forwarding packets through a DPDK VM. So if that test passes then we probably have a winner.

(One weakness of that test is that it is using fixed-size power-of-2 packets. To be more confident of the memcpy we would want to vary the packet sizes. I have done this manually a few times with glibc memcpy and it seems to behave well.)

@justincormack
Copy link
Contributor Author

One thing that is different is that we write the tests for special cases in Lua so they are inlineable, so we don't need to run the tests at all in a tight loop if they are static. We can also use the built in luajit memcpy (which we could change if it is non optimal). So we need a slightly different test setup.

@lukego
Copy link
Member

lukego commented Dec 30, 2015

LuaJIT only uses its own memcpy for fixed-size objects e.g. structs or constant-size arrays. Otherwise it will emit a call to libc memcpy. (This is much like GCC behavior.) The calls that we have to libc memcpy today are actually ffi.copy() with variable size data in the Lua sources.

@eugeneia eugeneia merged commit e09f698 into snabbco:master Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants