Live Patching Meltdown – SUSE Engineer’s research project (Part 3)
Building upon the Part 1 (Key technical obstacles for Live Patching Meltdown) and the Part 2 (Virtual address mappings and the Meltdown vulnerability), let’s now address the needed changes to the TLB flushing primitives.
In order to resolve virtual to physical addresses, a CPU must traverse the page table tree. This is a costly thing to do for every single memory access and for this reason, it keeps the results from these translation within a special cache, the Translation Lookaside Buffer (TLB). It is the kernel’s responsibility to instruct the CPU to flush any stale TLB entries whenever it has done any page table modifications.
One action causing a complete TLB invalidation is writing a pointer to a new page table root to the special CR3 register. For example, upon scheduling in a process, the kernel would install its associated memory map’s page tables at CR3 as part of the context switch and thus implicitly invalidate any translation cached from the previously running process.
It should be emphasized though that unnecessary TLB flushes are a bad thing to do from a performance perspective: not only do they consume time by themselves, they also throw away all the precious work done by previous page table walks.
Now recall that the mappings concerning the kernel space region, i.e. the upper range of all virtual address space, are always kept synchronized between all processes’ page tables. It would be good if cached translations from these regions could somehow outlive the writes to CR3 at context switches. Indeed, x86_64 CPUs offer a feature, “Global Pages”, which, if enabled, allows for marking certain mappings as global in the page tables. These thus tagged global translations are exempted from flushes due to writes to CR3. Before the advent of KPTI, the Linux kernel used to mark all translation for the kernel space region as global, thus avoiding unnecessary invalidations of these at context switches.
However, this behavior would thwart the whole point of KPTI which is to hide the kernel address space mappings from user space. Thus, a live patch for Meltdown must switch the CPUs’ “Global Page” feature off. In principle that would be as easy as writing a zero to a certain bit position within the special CR4 register, commonly referred to as CR4.PGE. There is one complication though: the Linux kernel’s TLB flushing primitives take advantage of the fact that flipping CR4.PGE from one to zero has the architecturally defined effect of invalidating all TLB entries, including the global ones. Their implementation, based on the assumption that CR4.PGE is always set, simply clears and re-enables that bit again. Now, disabling global pages from a live patch would render this assumption invalid and break the running kernel’s TLB invalidation mechanisms.
Fortunately, with having the global kGraft patch state at hands as explained above, a way out is straight forward: kGraft-patch the TLB flushing primitives to make them compatible with either setting of CR4.PGE, wait for the live patch transition to finish globally and clear CR4.PGE only afterwards.
Of course, disabling global pages comes at the price of the additional performance overhead they have originally been meant to mitigate. But KPTI has the potential for making matters even worse: at kernel entry, the stripped down user space shadow page table must be replaced by its fully populated counterpart. Similarly, the shadow copy has to be restored again at exit to user space. This means that there will be a lot of additional writes to CR3, always accompanied by a costly TLB invalidation. Reasonably recent Intel CPUs are equipped with a new feature, “Process Context Identifiers” (PCID), which allows for making the TLB invalidations due to writes to CR3 more selective. KPTI enabled kernels make use of it whereby they are able to avoid all flushes at entry to kernel space and many of those at exit to user space. Albeit being disabled on kernels of interest to live patching, PCID’s semantics are such that it can be easily enabled during operation. Any actual use requires the kGraft-patched TLB flushing primitives to be in place though, but this has been solved already.
Read on in the “Part 4”. In this final part, I will share the conclusion of this very interesting project.