GWPSan was co-designed along with necessary compiler and OS kernel support. Development of the runtime informed new compiler and kernel support, which we developed and upstreamed to LLVM and the Linux kernel.
Other platforms that do not meet the below requirements are unsupported.
To perform certain detailed runtime binary analysis on an otherwise unmodified binary, semantic metadata is required that is lost when generating machine code. For example, data race detection requires knowledge of atomic accesses to avoid false positives. For deployment in production, however, this metadata needs to be stored in the binary and needs to be accessible efficiently at runtime: the presence of the metadata should not affect performance of the binary unless it is accessed, and overall binary size should be minimally Impacted. We implemented support in the LLVM Compiler Infrastructure.
The implementation consists of a new middle-end pass, and backend pass, which rely on PC Sections Metadata (LLVM RFC) to emit the metadata into separate ELF binary sections.
Currently we store PC-keyed metadata for atomic operations and functions suitable for use-after-return detection. But other types of metadata can be added in future if required (for example, signed int operations that are subject for overflow checking, or fixed-array indexing for the purposes of out-of-bounds checking).
Also see the runtime implementation that parses and makes the PC-keyed metadata queryable.
Clang 18 or later includes all the above changes, which can be enabled with
-fexperimental-sanitize-metadata
. Some earlier versions of Clang already
support -fexperimental-sanitize-metadata
, but do not include optimizations
and necessary fixes. Since the compiler support is still marked experimental,
the runtime does not support earlier versions of the metadata (but we detect if
an older version is present and fail initialization).
Several new features, performance optimizations, and fixes were contributed to the Linux kernel to support GWPSan and similar use cases.
Efficient process-wide hardware breakpoint and watchpoint support. Prior to
these changes, each thread would have had to create its own
PERF_TYPE_BREAKPOINT
perf event. Manually managing perf events of all running
threads would have been too slow and complex in heavily multi-threaded
applications. See runtime implementation
here.
- perf: Rework perf_event_exit_event()
- perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
- perf: Support only inheriting events if cloned with CLONE_THREAD
- perf: Add support for event removal on exec
- signal: Introduce TRAP_PERF si_code and si_perf to siginfo
- perf: Add support for SIGTRAP on perf events
- selftests/perf_events: Add kselftest for process-wide sigtrap handling
- selftests/perf_events: Add kselftest for remove_on_exec
- signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures
- signal, perf: Add missing TRAP_PERF case in siginfo_layout()
- signal: Factor force_sig_perf out of perf_sigtrap
- signal: Deliver all of the siginfo perf data in _perf
- perf: Fix required permissions if sigtrap is requested
- perf: Ignore sigtrap for tracepoints destined for other tasks
- perf test sigtrap: Add basic stress test for sigtrap handling
- perf: Copy perf_event_attr::sig_data on modification
- signal: Deliver SIGTRAP on perf event asynchronously if blocked
- perf: Fix missing SIGTRAPs
- perf: Improve missing SIGTRAP checking
- perf: Fix perf_pending_task() UaF
Optimizing breakpoint accounting in the kernel. Prior to these changes, enabling/disabling breakpoints had noticeable performance impact on systems with high CPU counts. These changes did not change the kernel ABI, but prior to these changes we do not recommend enabling GWPSan.
- perf/hw_breakpoint: Add KUnit test for constraints accounting
- perf/hw_breakpoint: Provide hw_breakpoint_is_used() and use in test
- perf/hw_breakpoint: Clean up headers
- perf/hw_breakpoint: Optimize list of per-task breakpoints
- perf/hw_breakpoint: Mark data __ro_after_init
- perf/hw_breakpoint: Optimize constant number of breakpoint slots
- perf/hw_breakpoint: Make hw_breakpoint_weight() inlinable
- perf/hw_breakpoint: Remove useless code related to flexible breakpoints
- powerpc/hw_breakpoint: Avoid relying on caller synchronization
- locking/percpu-rwsem: Add percpu_is_write_locked() and percpu_is_read_locked()
- perf/hw_breakpoint: Reduce contention with large number of tasks
- perf/hw_breakpoint: Introduce bp_slots_histogram
- perf/hw_breakpoint: Optimize max_bp_pinned_slots() for CPU-independent task targets
- perf/hw_breakpoint: Optimize toggle_bp_slot() for CPU-independent task targets
- perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails
- perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex
Low-overhead POSIX timer based sampling. Prior to this change, the kernel would prefer the main thread to deliver the signal to, in which case the runtime would fallback to a slightly more expensive manual signal distribution algorithm. See runtime implementation here.
Linux kernel 6.4 or later includes all the above changes.