Destructive Nature of Compiler Optimizations

This is something that occurred to me while I was researching something else and I thought people might be interested in it. kind of a short essay / demonstration of awesome things that happen when abstract programming languages intersect with "rubber on the road" running code... Consider the following two functions:

void copyBytes(char *dst, char *src, size_t len) {
    if( len == 0 ) {
    } else {
        *dst = *src;
        return copyBytes(++dst, ++src, --len);
void copyBytes2(char * dst, char *src, size_t len) {
    size_t copied = len;
    char *d = dst;
    char *s = src;

    while( copied > 0 ) {
        *d = *s;


The question is, what do these functions do? Consider them for a few minutes before continuing (please).

So, they both pretty obviously copy memory. copyBytes performs the copy recursively, while copyBytes2 performs it iteratively. So, lets think about "what do these functions do" in terms of side effects.

Both result in "len" bytes from "src" moving to "dst". Simple enough, case closed, right? They're the same! Wait a minute though, copyBytes works recursively, this means that an additional side effect is for every byte copied, the stack pointer is decremented enough to create a new frame. Stack space is a limited resource, so, if you were copying a lot of data, copyBytes has the potential to exhaust stack space, resulting in the termination of that thread.

So, they're not the same?

Alright, compile it, test it, wait a minute, they are both capable of copying for large values of "len". Whats up?

So lets feed these 2 functions to LLVM, on two different optimization levels, to REALLY dig into this:

copyBytes with no opts:

copyBytes2 with no opts:

copyBytes with -O1 LLVM opts:

copyBytes2 with -O1 LLVM opts:

(LLVM IR is documented here: but this is short enough and non-weird enough that a lot of it should be pretty clear on its own)

So with no opts, copyBytes demonstrates the recursive behavior we expect. When we look at the IR for the "slightly" optimized version, we see the optimized has recognized how to optimize tail-call recursion into iterative logic, and that the resulting iterative logic is in fact EXACTLY the same as the source code that is explicitly iterative. The kind of interesting thing is that you can line up the two LLVM IR control flow graphs for the optimized functions and prove they are the same by comparing their side effects. The order, width, and count of loads and stores are identical.

So ... what controls whether or not these two functions express different behavior? Compiler optimization levels. So... people who audit source code... what exactly are you auditing? Usually, people are auditing what the compiled code will ACTUALLY DO and not really the code as written, because you really care about what the code actually does. This tells us, pretty conclusively, that to understand what code does you also have to have a very clear understanding of how it is compiled and what goes into its compilation.

So ... everyone who is (or was) focused on automated ways to verify C code ... have fun! When you're evaluating the sanity of some code, you're talking about something more meta-level than C...