-
Notifications
You must be signed in to change notification settings - Fork 0
First Things First
The paczfs
script creates an ArchLinux operating system image focused on developing containers and running containers in a very secure and fully customization way. It provides functionality equivalent to docker (using systemd-nspawn
) and CoreOS (it provides a minimal hardened OS to run containers on a VPS).
The pacarch
scripts are used to build containers (similar to Dockerfiles).
Both of these scripts use the bash
language, and are meant to be full customizable using a combination of environment variables and custom function definitions.
To leverage the full potential of paczfs
and pacarch
it is important to understand some bash
scripting, and specifically some of the bash
conventions we used in the scripts. This tutorial is for pacarch
the equivalent to what the Dockerfile documentation is to docker
. The example below is written with respect to pacarch
but the same principles and conventions apply to paczfs
.
Even if you are already familiar with bash
scripting, you should read the following. If something is discussed, it is important to understand it. There are lots of ways to do things in bash
, and the tutorial below is focused on the specific conventions we use in the paczfs
and pacarch
scripts and bash
version 4.4.
This tutorial is not specific to paczfs
and pacarch
, but it is specific to bash
v4.4+. If you understand this tutorial, you will be able to write very powerful bash
scripts, and to be able to fully customize the use of paczfs
and pacarch
.
You will NOT necessarily be able to interpret other bash
scripts though because bash
is a very old scripting language and is backward compatable in lots of ways that we don't understand. For example, bash
is happy to perform tests using single brackets (ie. [ "me" == "you" ]), but we don't discuss this in the tutorial below. The reason is that the single bracket tests are only for backwards compatability to allow the script to run with older versions of bash
. One reason bash
can be so intimidating is that so much of the online documentation discusses all of the different ways to do something.
This tutorial strives to define conventions representing the idiomatic way to do things in bash
v4.4+ with no regard for backward compatibility. This is a feature, not a bug! It doesn't try and explain multiple ways to do things, it try to explain the most robust way to do it in bash
v4.4+, and will allow you to write powerful bash
scripts and fully understand the paczfs
and pacarch
scripts. It won't necessarily help to to interpret arbitrary bash
scripts though, because it doesn't discuss the methods script authors use to make scripts more portable. If portability is important to you, this will still provide a good basis to better understand the "base" case that you want a "portable" alternative to.
This tutorial will feel like a bit of a "jump down the rabbit hole" at times focusing on details that aren't important or relevant. But trust us; we are lazy, so it is important and relevant. Like most things, bash
operates on some pretty fundamental underlying principles, and understanding them is the key to unlocking its power but this may require you to reconsider some things you thought you knew.
We have "tinkered" in bash
ever since the late 1990's, but it wasn't until we fell in love with the ZFS filesystem and started creating more and more powerful bash
scripts to get it onto the variety of servers that we use that we really started to understand the power of bash
. We realize that there are other powerful scripting languages (ie. zsh, csh, sh, ruby, python, etc.). For better or worse, we chose bash
for three reasons:
- it is already installed on most systems
- it is so closely integrated into the kernel, it is almost like interacting directly with the kernel
- it is designed to do the things you would use a
bash
script for
To cut down our learning curve, we decided to try and find a "sub-set" of bash
that would represent our idiomatic way to do things, and we settled on whatever is the bash
v4.4+ idiomatic way. We figured this would make our scripts "portable enough". Turns out we were wrong; mac still uses bash
v3. So we decided that in some cases, we would need to make some exceptions (such as using ash
scripts during early boot loading), but we wouldn't make exceptions for modern operating systems that are just behind the times.
Points 2 and 3 above just mean that bash
is really good at doing things at the operating system level such as mounting drives, altering files, or changing partition tables. But it can also execute any other arbitrary program or script installed on the system. A great way to make your application extensible/customizable by the biggest audience possible is to allow your users to provide bash
scripts that your application will execute at pre-determined times. For example, if you have a ruby script that is expecting a directory to be prepared in a specific way you could program that into the ruby script. To be more flexible you would probably refactor it out into a separate file and include some logic to make it more robust. To be even more flexible, you may provide some configuration variable that allows the user to use a custom ruby script. But for the most flexibility,
Typically when booting a *nix computer, some bootloader software launches the system kernel to boot the computer and the system kernel launches the program at /sbin/init. The reason "the computer" can't just launch the /sbin/init binary directly is that until something tells it which partition to look on it can't find it.
Grub is an example of a bootloader and is typically configured to provide a selection of available kernels. One of the other things grub is responsible for is passing a command line argument to the linux kernel identifying how to mount the root filesystem.
This may seem like a strange way to start is *nix bash
tutorial, but it is one of the most fundamental principles involved in virtualization and containerization, and also a perfect example why *sh
(by which we mean the family of sh, bash, csh, zsh, etc.) is so powerful in some situations.
And this is related to how it handles "arguments" and "variables". Tutorials about bash
variables tend to be technically concise, because technically it is concise, but there are some very profound and nuanced implications.
We are going to use this boot process to launch into an explanation of variables.
So you press the power button on a modern EFI based computer (not BIOS) and the firmware will search for partitions that have some specific characteristics (to identify them as EFI partitions). Next it will mount them and see whats on them. For our example, it finds the grub
EFI executable and runs it. Apparently EFI programs are used during early boot because they have some magic property to be able to choose or move which physical locations they use in RAM.
This is typical, and typically the next step is the user is presented with a grub (or lilo, etc.) menu and can select which kernel to use from whatever options are available. The reason this works is because the package manager for a typical *nix distro automatically runs the grub-mkconf
script which re-creates the grub configuration on the EFI partition anytime it installs a new linux kernel package.
Grub is then responsible for launching the kernel with the correct command line arguments. The first important thing to note here is that command line arguments ARE the fundamental concept on which *nix operates.
Let me explain why this is profound, and that this is how absolutely everything in the *nix world works unless people build wrappers which obscure it.
In the beginning there was a kernel, but to do anything with it you had to feed it long strings of 1's and 0's on punchcards. Eventually this wasn't exciting anymore so people started to think about better ways and devised the concept of "the shell". I think the first was probably sh
, but eventually people started forking their own implementations and so now we have bash
, csh
, zsh
, etc., each with their own strengths and weaknesses.
We think about "shell" scripting languages (ie. the *.sh
family) as being a highly optimized linux kernel APIs. And this is what makes them different from other scripting languages such as ruby
, python
, etc. which may have libraries to access the kernel (otherwise they wouldn't be able to save a file), but are typically focused on providing libraries for complex data structures, complex calculations or graphical user interfaces.
At this point, we are going to stop using the term *sh
with one final comment. Part of the reason there is a family of *sh
shells is because there are legitimately different use cases, but they all generally strive to provide some sort of mode which operates according to the POSIX standard (which I think is mostly based on the original sh
implementation). The "initramfs" discussed below uses ash
because it is very small; and lots of people really enjoy the interactive user experience of zsh
. It is common these days that /usr/bin/sh is actually a symlink to bash, but when called using the symlink it automagically loads in a special POSIX compatibility mode.
Which brings up the second profound point, which is that bash
has a form of polymorphism. Object oriented languages implement polymorphism by selecting a pre-defined implementation according to an object's declared type through inheritance and overriding. bash
implements polymorphism by inspecting environment variables. We will explain what we mean by discussing the busybox
application which is included in the "initramfs".
Now we will put some pieces together and try explain why we say that it is profound to understand command line arguments by continuing with our booting example. When you select one of the options from the grub menu, grub will check it's configuration file for the arguments that should be passed to the linux kernel. One of these command line arguments is what the kernel should mount as the root filesystem. If you only have one kernel and only one possible root filesystem, and the root filesystem is on a partition the kernel can read, then technically you don't really need a bootloader (such as grub) and partition the drive differently so the computer firmware will launch the linux kernel directly and completely skip the bootloader step (If you really know what you are doing, then please help us to understand dynamic EFI stub loading because we want to evaluate it as a replacement to grub in paczfs
).
Okay, so now grub is ready to launch the kernel using command line options from its configuration file. But what happens if the root filesystem requires kernel modules? The answer is to use an "initramfs".
Let me explain with an example. You start out with computer that has ubuntu installed on an ext4 partition, then you install centos on another ext4 partition. Through some package manager magic, whichever was the last system you installed updated the grub configuration on the EFI partition to include grub menu entries for both the ubuntu and centos system.
In this case an "initramfs" is commonly used, but not necessarily required because grub could just mount the filesystem and start the kernel. This is possible, because the "firmware" running the grub ELF binary has ext4 drivers built-in. But what if the root filesystem is on a partition that the firmware doesn't know how to read such as a ZFS partition (which is what paczfs
uses).
This is where an "initramfs" is needed, which is just a compressed file that contains an "ultra-minimal" linux operating system, but most importantly, it can also contain kernel modules and startup scripts (or "hooks") to execute very early in the boot process (even before the kernel is loaded). If you extracted an "initramfs" you would see the typical /bin /tmp, etc. folders (but a lot fewer).
So in summary, the "initramfs" is a minimal linux operating system that provides just enough functionality to mount the real root filesystem. In the case of paczfs
(which uses a ZFS root filesystem) the "initramfs" includes the zfs kernel modules and an ash
script to import the root filesystem pool and mount the correct dataset.
You may not realize it yet, but you just learned how to completely hi-jack a server on reboot as a way to "inject" your own server images (we think). The reason we say "we think" is because we haven't actually implement this yet. Admittedly, we don't fully understand how the "initramfs" works yet, but what came as an epiphany to us was that the earl boot process just executes ash
shell scripts, so to the extent that you can understand ash
, you can "inject" ash
scripts into the early boot process to do whatever you want.
If you install bash
on ArchLinux it requires 7.17 MB, but according to Wikipedia, ash
is a "a lightweight (92K) Bourne compatible shell. Great for machines with low memory, but does not provide all the extras of shells like bash, tcsh, and zsh".
Get it? ash
requires 92kB, whereas bash
requires 7.17MB. Since the "initramfs" is loaded into ram, nobody wants to wast space with bells and whistles, so it uses ash
instead of bash
. We don't have the human resources to learn ash
and write a tutorial, but our experience is that after we learned bash
scripting (as outlined in this tutorial), we had enough of a background to also tweak existing ash
scripts. In fact, the paczfs
utility actually does tweak the upstream zfs early boot ash
script.
Docker published an article explaining why they used the 'go' programming language to write Docker. One of the main purposes of this article is to communicate that Docker is basically just a wrapper (or CLI) around capabilities that are built-in to the linux kernel. Docker does not actually perform process isolation, the Linux kernel performs the process isolation and Docker just tells it how. Docker provides some extras (such as swarm), but with respect to running containers, there is nothing docker
can do that systemd-nspawn
can't since they are just two different Command Line Interfaces (CLI) to capabilities which are built into the kernel.
The systemd-nspawn
application is part of "systemd", which is a modern linux init system that is the successor to the "initv" system initialization system. Right, wrong, or indifferent, "systemd" has become the de-facto standard for linux init systems. A quick check of the online question/answer site is full of questions about how to launch Docker containers as systemd services. More frustrating though, is the number of question regarding how to run containers with more than one process. Unfortunately, Docker has propagated the idea that containers should only run a single process. This is a completely artificial limitation based on an assumption that Docker made when initially developing their system. Get this idea out of your mind right now, a container can run anything you want (as long as it has enough permissions and the host has the right kernel drivers; because remember the difference between running a "container" and running a "virtual machine" is that a "container" ).
On the other hand, systemd-nspawn
is also not without its quirks because it expects (by default) that the container is also running a systemd system. The difference though, is that systemd-nspawn
only expects this as a default, and is happy to accommodate a single process system, whereas Docker make a fundamental assumption that a container will only run a single process and so anything else is fundamentally a hack since it is explicitly going against the design philosophy. We guess not everyone will see it this way, but when we see discussions on stackoverflow about docker single process we roll our eyes (and ultimately got motivated to create this documentation (for better or for worse)).
When last we discussed the boot process we were looking into the "initramfs", which is a compressed file containing a very minimal linux operating system that the bootloader can decompress into RAM and then execute the /sbin/init file located in RAM where the "initramfs" was decompressed to do anything special that the computer's firmware may not know how to do (like mounting a zfs filesystem as the root filesystem).
This may seem like a lot of complexity to do something that should be simple. Hopefully, the following explanation will convince you that while it may make it a little more difficult to boot our machines; it helps us to keep our beer stocked: so it it worthwhile. The logic works like this:
In the beginning there was the first os Then there was the second (but it couldn't coexist together) Then there were alot of *nix's Then somebody figured out how to get *nginx systems to coexist (ie. grub, lilo, etc.) Then somebody figured out how to get *nginx and windows system to coexist (bootcamp???) Then somebody realized realized that eventually people will need to bootload the os in the refrigerator that automatically reorders the beer (but doesn't have a persistent hard drive)
Obviously, the final point is exaggerated for effect, but it is valid and important. Technology works in a hopscotch fashion, and the critical moment was the realization that since there is no way to know what the future will hold, the next best thing is to provide it a predictable starting environment and let it do what it needs to.
Historically, most of us have considered the BIOS to just be the thing that knows how to load the operating system. And when people only booted one operating system per physical machine using well known filesystems (ie. ntfs OR ext3) this worked great. Eventually this system started to crack as people wanted both *nix and MS filesystems to coexist on the same physical machine, and started using esoteric (at the time) filesystems such as ext3, ext4, btrfs, zfs, etc. How mad would you be if you just paid for a new copy of MS Windows 10 and found out that it won't boot on your machine (apparently this has happened).
Oh yeah, and some people will be working on refrigerators (as part of the IOT) that don't have any hard drive at all (see; we are not just rambling; we actually close our loops).
We doubt that the historical re-creation above is accurate, but it is a fun way to make the point that the concept of an "initramfs" is a sort of compromise that says: "we don't know what the future holds, so as a common starting point for every machine that agrees to follow this convention (whether it is a server in the cloud or a refrigerator in your home), let's agree that the only thing it can rely on is that if the physical device has enough ram it can load something into ram to figure the rest out".
We use refrigerators and beer to try and inject some levity into the discussion, but a more common example is the WiFi router in your house. Somewhere in that router is a flash rom (read-only memory) card with a compressed image of a filesystem that gets decompressed into ram every time the router starts. Somewhere else inside your router, there is some kind of internal usb memory stick that your WiFi configuration is saved to. In this case, the "initramfs" is the actual operating system (I think).
So, here is how the following operations actually work on your WiFi router:
- unplug and replug the power cable - a fresh copy of the compressed initramfs image is loaded into memory at every boot (ie. the system is reset), but the configuration on the internal usb is unchanged (ie. a soft reset)
- factory reset - the internal usb is erased but the initramfs is unchanged (ie. it boots as normal, but has no user configuration yet)
- firmware upgrade - the "initramfs" compressed file is replaced so the next reboot will decompress and use the new initramfs, but the internal usb is still the same (if the configuration on the internal usb is compatible withe the new initramfs no problems, but if not, there can be some strange results; which is why conventional wisdom is to reconfigure devices after flashing the rom).
This article is not about WiFi routers though, but when we get frustrated because the boot process seems complicated we think it is helpful to remember that the architects of this system were forward thinking enough to generalize the boot process (ie. accommodating an initramfs) to be able to boot things like WiFi routers (which don't have hard drives). It is this generalization (or abstraction) which we take advantage of to import the root filesystem with paczfs
.
In summary, the EFI boot process is:
- firmware finds fat32 with bootable flag an executes grub
- grub loads its configuration and presents a menu
- user selects an option from the grub menu
- grub loads (decompresses into ram) the initramfs and passes linux kernel command line arguments
Wow, we are a long way into the section titled "variables" without actually discussing variables yet.
But now we have the "context" to have a good discussion about variables and we are going to use the paczfs
script as an example. It is okay if you dont know what
paczfs` is. The important thing for this example is that it is a script to create a server image that uses a ZFS root filesystem. The reason this is a good example is because is is a real world example of something that can't be done without an initramfs (since most firmware can't load zfs filesystems), and because it is an excellent example of something that should be done in a "shell" scripting language.
As one final example to demonstrate
busybox
which has the interesting characteristic that it can replace over 170 standard linux utilities, but of course, it is not as full featured as the normal utilities.
The reason the "initramfs" uses busybox
is because each time the computer boots a fresh copy of the compressed "initramfs" is decompressed into RAM providing a predictable system state.
, and actually understanding declare --help
is the first step.
As you read this, open a shell and run bash --version
to make sure you are using v4.4