An aws engineer discovered a 50% regression in postgres throughput while testing the new Linux 7.0 kernel. The cause turns out to be massive TLB and page faults exacerbated by Postgres process-based design. In this backend engineering show episode I dive deep into how this was discovered, the root cause and the possible fixes and workarounds. Intermediate and Advanced Backend Engineering Course Bundlehttps://courses.husseinnasser.com/bundleMy Book, Root Cause: Stories and Lessons from Two Decades of Backend Engineering Bugs https://amzn.to/4cKfZhe 0:00 Intro2:30 The Discovery6:30 Spinlocks9:25 Preemption 13:00 Root Cause17:00 How Postgres Processes exacerbated the problem 22:30 Is the fix easy?25:50 Summary