House of C(ards)
Don’t get me wrong, I like hacking on C. C was my first real programming language (Pascal didn’t count), and I remember fondly the year I spent hacking on MINIX. That year was my coming of age in programming, that’s when I really fell in love with it, and all because of that beautiful body of C code. If you are writing an operating system, or a compiler, or any other piece of software that needs to be very close to the machine, then C is the language you want. You want the power to control memory allocations and the flow of instructions. You want nothing to get in the way of your programming that machine.
However, most applications these days don’t need that. At all. The Linux kernel is pretty good already, thanks to the wonderful engineers that followed the footsteps of Andrew Tanenbaum. Seriously, a Web server doesn’t need to be written for the machine; a security library doesn’t need to be written for the machine, for Christ sake! Let a high-level language processor do its job, and chances are all of these bugs that result from accidental memory and control flow mismanagement will simply disappear.
Back in 1992, in the dawn of the global Internet, there weren’t that many alternatives to C when developing for Linux. The safer programming languages of the time were slow as snails. And so it is that we have the LAMP stack all written in C: Linux, Apache httpd (a descendant of NCSA httpd written ~1993), MySQL (circa ~1994), and the P scripting languages, all started at around the same time (early 1990s). This combo, especially Apache httpd, has been driving the development of the infrastructure of the Internet. As such, it’s understandable that when the developers of OpenSSL decided to do an open source library implementing the SSL and TLS protocols around 1998, they chose C as the language. After all, had they chosen Python or Java, for example, Apache httpd, MySQL, and lots of other C-based network applications at the time would have been left out of using it. Doing it in C would place this library almost at Kernel level, so all applications — including network servers and language processors — could use it. And use it they did!
One can’t argue with this logic. It makes engineering sense. But it shows how what we have now is a vulnerable House of C(ards) that can crumble at any time because… well, because of C.
C is simply too risky for library and application development. The use of C needs to be isolated to its natural habitat — the Kernel and any other pieces that need to control the machine directly. Everything else needs to use a layer that isolates and abstracts the machine, steering people away from the dangerous cliffs that are explicit memory management, unsafe casts and unstructured control flow. We’ve known this for years; people knew it even before the Web came about! The Web has been plagued by security flaws that are almost always a direct consequence of bugs in components written C. The use of safe(r) languages is not just a necessity; it’s a duty that any responsible software engineer who is not a Linux kernel developer or a compiler writer must honor.
Where does that leave the existing infrastructure, such as the millions of web sites that use OpenSSL? I’m not sure. After this enormous punch in the eye, it’s possible that OpenSSL will get the love it deserves, that it will be code-reviewed by a lot more people and tools, so that this particular bug doesn’t happen again. But does anyone believe that these C components will ever be freed from bugs related to memory management, unsafe types and unstructured control flow? I don’t.
The only way forward is for developers to steer away from using components that are manually written in C/C++, like the plague. Hopefully, Heartbleed will motivate people to rewrite OpenSSL in other languages — an OpenSSL in Python would be an improvement already, as the Python ecosystem is huge and can support everything that is currently being done on the Web out of the box (e.g. Tornado is a great alternative to Apache httpd). But using a high-level language that is safe, to then generate C code would also be a viable way to go. Anything, except using components manually written in C/C++ by normal humans! (Security experts, in particular, should flat-out be prevented from writing a single line of C code!)