Modern datacenter applications struggle with the need to access thousands of servers while still providing a fast response time to the user. In these situations, the user’s overall request is not complete until the slowest of the subrequests has completed, making it important to design network services that offer not just low latency but predictable latency. We are developing techniques for building systems that offer predictable response time.
At the operating system level, we have conducted an extensive measurement study to identify factors that can cause even a completely deterministic application to have occasional requests that take several orders of magnitude longer than expected. Using a set of modifications to the kernel scheduler, network stack, and application architecture, we can reduce the tail latency to within a few percent of optimal.