Performance clearly matters to users. For example, the most common software update on the AppStore is “Bug fixes and performance enhancements.” Now that Moore’s Law has ended, programmers have to work hard to get high performance for their applications. But why is performance hard to deliver? I will first explain why current approaches to evaluating and optimizing performance don’t work. I’ll show how complicated performance has become on modern systems, and how compiler optimizations have essentially run out of steam. Next, I’ll introduce two radically new performance profilers that guide programmers directly to the code they need to change to improve application performance. The first is Coz, a new “causal profiler” for C/C++/Rust that lets programmers optimize for throughput or latency, and which pinpoints and accurately predicts the impact of optimizations via what we call “virtual speedup” experiments. Coz’s approach unlocks previously unknown optimization opportunities. Guided by Coz, we improved the performance of applications by as much as 68%; in most cases, this involved modifying less than 10 lines of code and took under half an hour (without any prior understanding of the programs!). Coz now ships as part of standard Linux distros. The second is Scalene, a “scripting-language aware” profiler for Python. Scalene runs orders of magnitude faster than other profilers while delivering far more detailed information – information that’s especially valuable to Python programmers. Via a combination of sampling, inference, and disassembly of byte-codes, Scalene efficiently and precisely attributes execution time and memory usage to Python, which developers can optimize, or library code, which they cannot. Its novel sampling memory allocator efficiently reports line-level memory consumption and trends with low overhead, helping developers reduce footprints and identify leaks. Finally, Scalene reports a new metric, copy volume, that helps developers root out insidious copying costs across the Python/library boundary, which can drastically degrade performance. Scalene is available on PyPi.
The discussion and AMA following this talk will be moderated by Ben Zorn.
Emery is a Professor at the University of Massachusetts Amherst. He researches languages, runtime systems, and operating systems, with a particular focus on systems that transparently improve reliability, security, and performance. Emery and friends have created Hoard, the first scalable memory manager (malloc), on which the Mac OS X memory manager is based; DieHard, an error-avoiding memory manager that directly influenced the design of the Windows 7 Fault-Tolerant Heap; DieHarder, a secure memory manager that was an inspiration for hardening changes made to the Windows 8 heap; the Coz profiler, which ships with modern Linux distros; and more. He was named an ACM Fellow in 2019.
Fri 20 NovDisplayed time zone: Central Time (US & Canada) change
09:00 - 09:40
|Performance Really MattersAMA
Emery D. Berger University of Massachusetts at Amherst