Multithread application sharing struct nl_cache objects using NL_AUTO_PROVIDE crashes in libnl3-3.4.0 #273

amritpaldhillon · 2021-03-15T02:27:26Z

The test app is a simple application that launches multiple threads (2 by default). Each thread creates its own nl_cache_mngr object with nl_cache_mngr_alloc() API. Then each thread creates 2 caches, one for "route/link" and another for "route/addr". Then each thread polls the Netlink Cache Manager, using nl_cache_mngr_poll() API. The callback functions previously registered in the nl_cache_mngr_add() API are triggered when the corresponding events happen.

The callback functions don't do much except print the fact callback has been called every 100 multiples, along with the Thread ID for the callback.

The system information:

$ cat /etc/debian_version
10.3

$ uname -a
Linux <hostname-removed> 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux

$ dpkg -l libnl* | grep "^ii"
ii  libnl-3-200:amd64       3.4.0-1      amd64        library for dealing with netlink sockets
ii  libnl-3-dev:amd64       3.4.0-1      amd64        development library and headers for libnl-3
ii  libnl-genl-3-200:amd64  3.4.0-1      amd64        library for dealing with netlink sockets - generic netlink
ii  libnl-route-3-200:amd64 3.4.0-1      amd64        library for dealing with netlink sockets - route interface
ii  libnl-route-3-dev:amd64 3.4.0-1      amd64        development library and headers for libnl-route-3

$ gcc --version
gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Command to compile the source code into test application

$ gcc -Wall -I/usr/include/libnl3 -O0 -ggdb  -lnl-3 -lnl-route-3 -lpthread  netlinkThreads.c   -o netlinkThreads

In a terminal (normal user) run the test app (below command launches 10 threads). I have found running 10 threads cause the crash to happen quickly as compared to when running 2 threads.

$ ./netlinkThreads 10

Now in a separate shell (root) create 1024 VLAN interfaces and constantly bring them up and down.

Command to create VLAN on eth0 (run as root)

# for f in {1..1024} ; do ip link add link eth0 name eth0.$f type vlan id $f; done

Command to bring VLAN interfaces up and down (also run as root)

# while true; do for f in {1..1024} ; do ip link set dev eth0.$f down; done; for f in {1..1024} ; do ip link set dev eth0.$f up; done; sleep 1;  done

Command to delete VLAN interfaces (run as root), once the test is complete.

# for f in {1..1024} ; do ip link set dev eth0.$f down; done
# for f in {1..1024} ; do ip link del dev eth0.$f; done

Sample output of the test.

$ ./netlinkThreads 10
main: Launching 10 threads
main: created 1 thread
main: created 2 thread
main: created 3 thread
main: created 4 thread
main: created 5 thread
main: created 6 thread
main: created 7 thread
main: created 8 thread
main: created 9 thread
main: created 10 thread
netlinkQuery: Thread #3 is ready
netlinkQuery: Thread #10 is ready
netlinkQuery: Thread #6 is ready
netlinkQuery: Thread #5 is ready
netlinkQuery: Thread #7 is ready
netlinkQuery: Thread #8 is ready
netlinkQuery: Thread #1 is ready
netlinkQuery: Thread #2 is ready
netlinkQuery: Thread #9 is ready
netlinkQuery: Thread #4 is ready
addrCallback: Thread #3 100
addrCallback: Thread #6 100
addrCallback: Thread #10 100
addrCallback: Thread #7 100
addrCallback: Thread #5 100
addrCallback: Thread #2 100
addrCallback: Thread #8 100
addrCallback: Thread #1 100
addrCallback: Thread #9 100
addrCallback: Thread #4 100
addrCallback: Thread #5 200
addrCallback: Thread #9 200
addrCallback: Thread #2 200
addrCallback: Thread #6 200
addrCallback: Thread #8 200
addrCallback: Thread #3 200
addrCallback: Thread #10 200
addrCallback: Thread #7 200
addrCallback: Thread #1 200
addrCallback: Thread #4 200
addrCallback: Thread #10 300
addrCallback: Thread #3 300
addrCallback: Thread #6 300
addrCallback: Thread #5 300
addrCallback: Thread #7 300
addrCallback: Thread #2 300
addrCallback: Thread #4 300
addrCallback: Thread #1 300
addrCallback: Thread #9 300
addrCallback: Thread #8 300
addrCallback: Thread #10 400
addrCallback: Thread #3 400
addrCallback: Thread #4 400
addrCallback: Thread #2 400
addrCallback: Thread #6 400
addrCallback: Thread #7 400
addrCallback: Thread #8 400
addrCallback: Thread #1 400
addrCallback: Thread #9 400
addrCallback: Thread #5 400
BUG at file position /build/libnl3-3.4.0/./lib/object.c:225:nl_object_put
BUG at file position /build/libnl3-3.4.0/./lib/object.c:225:nl_object_put
netlinkThreads: /build/libnl3-3.4.0/./lib/object.c:225: nl_object_put: Assertion '0' failed.
BUG at file position /build/libnl3-3.4.0/./lib/object.c:225:nl_object_put
netlinkThreads: /build/libnl3-3.4.0/./lib/object.c:225: nl_object_put: Assertion '0' failed.
Aborted

The crash goes away if NL_AUTO_PROVIDE is not used, i.e. replaced by 0. See commented code in the APP. Any thread can use the NL_AUTO_PROVIDE flag and the bug will be produced. All threads need to NOT use the NL_AUTO_PROVIDE in order to suppress the bug.

netlinkThreads.zip

The text was updated successfully, but these errors were encountered:

The nl_object_get() and nl_object_put() APIs were not updating the ce_refcnt in thread safe manner. This change added the ce_lock member to the struct nl_object and makes sure ce_refcnt is accessed in thread safe manner. Fixes thom311#273

thom311 · 2023-07-25T12:04:48Z

As you found out, the objects and data structures of libnl are not thread-safe.

What is supposed to work, is that you (for example) create two caches on two threads, and each thread handles the objects that belong to it. That implies, that no global data is used without locking. If you are careful, you can also pass an object from one thread to another, but make sure that at not time both threads access the objects. But you cannot access objects from multiple threads (unless you add your own locking on top of it).

Maybe (just maybe), the basic reference counting of objects should be thread-safe too. Then at least, you could easier pass one object to another thread (although, still two threads could not do anything with those objects except ref/get and unref/put). Still, it's not clear to me that that is desirable. What are you going to do with that? You pass a cache/object to another thread, and then what? I think applications should either do their own locking on top of libnl object, or avoid accessing data from multiple threads.

Also, if it were implemented, then I think this should be done using atomic counters (compare+exchange), and not a pthread lock. <pthreads> is a very popular library, so it probably would be OK to add that as an unconditional dependency, however, you don't need that for atomic operations (we do depend on gcc/clang already).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithread application sharing struct nl_cache objects using NL_AUTO_PROVIDE crashes in libnl3-3.4.0 #273

Multithread application sharing struct nl_cache objects using NL_AUTO_PROVIDE crashes in libnl3-3.4.0 #273

amritpaldhillon commented Mar 15, 2021

thom311 commented Jul 25, 2023

Multithread application sharing struct nl_cache objects using NL_AUTO_PROVIDE crashes in libnl3-3.4.0 #273

Multithread application sharing struct nl_cache objects using NL_AUTO_PROVIDE crashes in libnl3-3.4.0 #273

Comments

amritpaldhillon commented Mar 15, 2021

thom311 commented Jul 25, 2023