Learning how to write performant code is hard. Here are a few simple laws that I hope will convey the core of the matter. I’m calling them…
Crista’s Five Laws of Performant Software
- Programming language << Programmers’ awareness of performance. The programming language doesn’t matter as much as the programmers’ awareness about the implementation of that language and its libraries. These days, all mainstream programming languages and their standard libraries are pretty optimized, and can be used to write performant code in a large range of application domains. They can also be used to write horribly performing code. For better or for worse, the high-level languages provide a large surface area of candy features and libraries that are really awesome to use… until you realize they require huge amounts of memory, or have a super-linear behavior with size of input. It is critical that people question “how does this magic actually work?,” go search for the answer, and figure out the best way of scaling things if the convenient candy is not as good as needed. There is usually another, better performing way of doing it even in high-level programming languages. (The main reason why C/C++ programmers don’t run into this as often is because there is an appalling lack of candy in the C/C++ ecosystem… performance isn’t hidden – nothing is!)
- or: small design details matter. Here, x and y denote two versions of the same code, and d(x, y) is the difference between them. fτ(x) and fτ(y) are the effects of running that code after a certain time, and d(fτ(x), fτ(y)) is the difference between them. What this law says is that small code differences can have huge effect differences. Some may recognize this law from chaotic systems (or just follow the link!). Are you producing too much garbage unnecessarily? Are you caching things properly? Are you getting rid of useless network packets/requests as early as possible? Are you overusing serialization or using a slow serializer? Are your dictionaries too big to the point of being inefficient? In high-level languages, unless your application design is seriously broken, you don’t need big code changes to make your program perform better; very small code changes can have huge consequences for performance. For example, a cache can be added in 10 LOCs, string concats can be replaced by string builders in the same amount of lines, filters can be added in a couple of lines, etc. Important design decisions, small code changes, big effects!
- corr(performance degradation, unbounded resource usage) > 0.9. This law says that there is a very high correlation between performance degradation, including crashes, and the unbounded use of resources. The slope from writing small apps to writing robust programs that survive all sorts of abuse from uncontrolled input is very steep, and requires a mindset focused on operation rather than on function. When you don’t limit the use of resources, chances are they will be exhausted. Does your program start ad-hoc threads? Use a threadpool with fixed size. Do you need in-memory dictionaries for fast lookups? Use a bounded cache. Do threads pass data among themselves? Use blocking queues whose capacity is a function of the max amount of data that can potentially be waiting without exhausting the memory. Does your program read text data from the file system? Unless you are 100% sure the lines are always of reasonable size, do not use readline. Limit, limit, limit! Don’t use resources in an unbound manner, or the operation of your program will degrade very quickly after a certain threshold.
- Performance improvements = log (controlled experiments). It seems pretty obvious, but it’s amazing how many times I’ve seen people not being able answer simple questions about their programs such as “how long does this function actually take to run?” or “how much memory does this data structure entail?” Not knowing is not a problem; being oblivious of the need to find out is a big problem! It is imperative to build small, controlled benchmarks with the purpose of measuring things, so to make estimates from there, and to identify and fix performance bottlenecks. This can be done with the help of profilers, too, if you don’t know where the bottlenecks are. But there’s a law of diminishing returns on these experiments: a few parts of the code are bottlenecks, and need to be tuned carefully; most parts don’t contribute much to performance, so there’s not much value in measuring those. A few, meaningfully placed performance measurements can get you 90% of the value of measuring things.
…and last, but not least:
- N*bad ≠ good. What this says is that, with sufficiently large data, no amount of nodes, cores, or memory will save you from code written with performance-related design flaws. This law comes from seeing people (students and professionals) using multi-core servers, map-reduce clusters, HPCs, and servers with large RAMs in order to make their software run faster… without first doing a proper job at making the sequential version run as well as it can on a small memory footprint. I have seen systems that took days to finish in a multi-core server and large allocated RAM being reduced to hours in just 1 core and smaller memory due to rewriting the code with performance in mind. Code that doesn’t perform, or that crashes, in a single core with a just few G of RAM will be equally bad in more powerful hardware.