Navigating Restartable Sequences: A Technical Guide to API Compliance and Hyrum's Law in Kernel Development
Overview
Restartable sequences (rseq) are a powerful Linux kernel mechanism that allows user-space threads to perform per-CPU operations with minimal overhead. Introduced in Linux 4.18, they enable lock-free, efficient memory allocation and other critical operations by providing atomic regions that can be safely aborted if the thread is preempted or migrated. However, a recent incident involving Google's TCMalloc memory allocator and the kernel's no-regressions policy has highlighted a classic software engineering principle: Hyrum's Law. This law states that any observable behavior of a system—even if undocumented—will eventually be depended upon by someone. In the kernel community, a performance improvement in rseq for Linux 6.19 broke TCMalloc because TCMalloc relied on undocumented behavior of the rseq API. This guide explores the technical details of restartable sequences, the TCMalloc violation, and lessons for developers.
Prerequisites
Before diving into the details, ensure you have a solid understanding of:
- Linux kernel development concepts: system calls, scheduling, preemption, CPU affinity.
- Memory allocators: how allocators like TCMalloc use thread-local caching for performance.
- The
rseqsystem call: its API, critical sections, abort handlers. - Hyrum's Law: the principle of implicit dependencies on observable behavior.
- Kernel no-regressions rule: the policy that new kernel versions must not break existing userspace applications.
Step-by-Step: Understanding the Rseq API and TCMalloc's Violation
What are Restartable Sequences?
Restartable sequences allow a thread to execute a sequence of instructions atomically with respect to preemption and migration. The typical usage is:
- Define a critical section using
rseqwith a start pointer and a length. - If the thread is preempted or migrated while inside the critical section, the kernel restarts the sequence from the beginning.
- The user-space code must be idempotent; it must be safe to abort and re-execute.
The documented API guarantees that the kernel will abort the sequence (by jumping to an abort handler) if the thread is migrated or preempted. Importantly, the abort is guaranteed to happen before the thread resumes execution, so the critical section never runs partially on another CPU.
The Documented API vs. TCMalloc's Usage
According to the kernel documentation (and the rseq man page), a restartable sequence must be designed to handle arbitrary aborts. The sequence must not depend on the kernel leaving it alone; any number of aborts can occur. TCMalloc, however, violated this contract. It used rseq for fast per-thread memory cache operations, but its critical sections were not fully restartable. Specifically, TCMalloc relied on the absence of aborts in certain common code paths. For instance, if a thread was not migrated and not preempted, rseq would never abort—and TCMalloc depended on this behavior to avoid restart logic in those paths.
In Linux 6.19, the kernel introduced a performance optimization that changed the conditions under which rseq aborts occur. The abort threshold was lowered, meaning the kernel would more aggressively abort sequences to reduce latency. While the documented API (i.e., ''aborts may happen at any time'') remained unchanged, the actual behavior changed: aborts became more frequent. TCMalloc, which had not implemented proper abort handling for all paths, began to fail. The library's internal state became inconsistent because its critical sections were not designed to be restarted in those new scenarios.
How the Kernel's No-Regressions Rule Applies
The Linux kernel has a strict no-regressions rule: a new kernel version must not break userspace programs that worked on a previous version. TCMalloc is a critical dependency for many applications (including Chrome), so its breakage is unacceptable. Despite TCMalloc's violation of the documented API, the kernel developers had to find a way to accommodate TCMalloc's behavior. This led to two approaches:
- Reverting the performance change, which would restore the previous abort behavior but lose the optimization.
- Adding a compatibility flag to the rseq system call that allows userspace to opt into a more predictable abort behavior, such as a flag indicating ''I rely on the old behavior''.
Ultimately, the kernel community opted for a middle ground: they introduced a new rseq flag (e.g., RSEQ_FLAG_UNPRIVILEGED) that triggers a slower but more compatible path for libraries like TCMalloc, while allowing optimized paths for well-behaved users. This decision respects the no-regressions rule while still permitting progress for compliant code.
Common Mistakes
- Assuming rseq never aborts in practice: Even if aborts are rare, they can become common after kernel updates. Always handle aborts correctly.
- Ignoring the restartable requirement: Your critical section must be idempotent. If you modify external state (like a memory pool), ensure the changes are reversible or only committed after the sequence completes.
- Relying on undocumented kernel behavior: TCMalloc's assumption that aborts would not happen in certain circumstances was a violation of the API. Always code to the documented contract, not observed behavior.
- Not testing with edge-case scheduling: Use tools like
stressto force preemption and migration while your code runs rseq critical sections. - Forgetting the abort handler: You must provide an abort handler (a jump target) that restores any necessary state before restarting the sequence.
Summary
The TCMalloc incident is a vivid reminder of Hyrum's Law in system programming. Even a popular library can inadvertently depend on undocumented behavior, leading to breakage when the kernel evolves. For developers using restartable sequences, the lesson is clear: always adhere strictly to the documented API. Ensure your critical sections are fully restartable and never rely on the absence of aborts. The kernel's no-regressions rule protects userspace, but it also forces kernel developers to accommodate misbehaving code—an expensive and suboptimal solution. By writing compliant code, you help maintain the health of the entire ecosystem. For more details, see the kernel documentation on rseq and the LWN article on this issue.