• treadmill

Treadmill

treadmill

Treadmill

Tail Latency Measurement at Microsecond-Level Precision

Slides

Tutorial at The 22nd International Conference on Architectural Support
for Programming Languages and Operating Systems (ASPLOS 2017)

8:30 A.M. - 12:00 P.M. April 8th, 2017 at Xi'an, China

Managing tail latency of requests has become one of the primary challenges for large-scale Internet services. Data centers are quickly evolving and service operators frequently desire to make changes to the deployed software and production hardware configurations. Such changes demand a confident understanding of the impact on one’s service, in particular its effect on tail latency (e.g., 95th- or 99th-percentile response latency of the service). Evaluating the impact on the tail is challenging because of its inherent variability. Existing tools and methodologies for measuring these effects suffer from a number of deficiencies including poor load tester design, statistically inaccurate aggregation, and improper attribution of effects, which can often result in misleading conclusions.

In this tutorial, we will first survey these common deficiencies in existing methodologies, analyze how they can impact tail latency measurements, and present techniques to overcome them. Then we will introduce Treadmill, an open-source software infrastructure that achieves microsecond-level precision tail latency measurement. This will cover the design of Treadmill, the standard procedure for using Treadmill to measure tail latency, the process for extending Treadmill to support your own workloads. At the end of the tutorial, we will present several applications that can only be enabled by such high precision tail latency measurements, including performance A/B testing in continuous integration, capacity planning for data centers, and attribution of the source of tail latency.

Publications

Yunqi Zhang, David Meisner, Jason Mars, Lingjia Tang. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference. Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (ISCA 2016).

Organizers

treadmill

Yunqi Zhang

Ph.D. Candidate

University of Michigan

treadmill

Johann Hauswald

Ph.D. Candidate

University of Michigan

treadmill

David Meisner

Software Engineer

Facebook Inc

treadmill

Jason Mars

Assistant Professor

University of Michigan

treadmill

Lingjia Tang

Assistant Professor

University of Michigan

treadmill

Schedule

08:30 A.M. ~ 09:00 A.M. -- Overview of tail latency measurement for data center applications

09:00 A.M. ~ 09:30 A.M. -- Common pitfalls in existing tail latency measurement methodologies

09:30 A.M. ~ 10:00 A.M. -- Coffee break

10:00 A.M. ~ 10:20 A.M. -- Treadmill: a load tester to achieve microsecond-level precision measurement

10:20 A.M. ~ 10:40 A.M. -- Robust tail latency measurement procedure using Treadmill

10:40 A.M. ~ 11:00 A.M. -- Extending Treadmill to support your own workloads

11:00 A.M. ~ 11:30 A.M. -- Applications enabled by high precision tail latency measurements