- I learned about DFA (Data Flow Analysis) optimizations back in the early 1980s. I eagerly implemented them for my C compiler, and it was released as "Optimum C". Then came the C compiler roundup benchmarks in the programming magazines. I breathlessly opened the issue, and was faced with the reviewers' review that Optimum C was a bad compiler because it deleted the code in the benchmarks. (The reviewer wrote Optimum C was cheating by recognizing the specific benchmark code and deleting it.)
I was really, really angry that the review had not attempted to contact me about this.
But the other compiler venders knew what I'd done, and the competition implemented DFA as well by the next year, and the benchmarks were updated.
The benchmarks were things like:
void foo() { int i,x = 1; for (i = 0; i < 1000; ++i) x += 1; }
If you claim an IR makes things harder, just skip it.Why compilers are hard – the IR data structure
There we go. The section header should be updated to:Compilers do have an essential complexity that makes them "hard" [...waffle waffle waffle...] The primary data [...waffle...] represents the computation that the compiler needs to preserve all the way to the output program. This data structure is usually called an IR (intermediate representation). The primary way that compilers work is by taking an IR that represents the input program, and applying a series of small transformations all of which have been individually verified to not change the meaning of the program (i.e. not miscompile). In doing so, we decompose one large translation problem into many smaller ones, making it manageable.Why compilers are manageable – the IR data structure- This resonates with how compiler work looks outside textbooks. Most of the hard problems aren’t about inventing new optimizations, but about making existing ones interact safely, predictably, and debuggably. Engineering effort often goes into tooling, invariants, and diagnostics rather than the optimizations themselves.
- > A miscompile of an AI program can cause bad medical advice
I think the AI program is plenty capable of causing bad medical advice on its own without being miscompiled.
- This is a tangent on language semantics (of the English kind) instead of engaging with the (interesting!) narrative on miscompiles. Feel free to skip.
> A compiler is a translator that translates between two different languages.
I lament the word "compile" subsuming "translate" for most of tech. Computers interpret instructions and do them now, and translate instructions from one code to another that is later interpreted or translated again.
The English word "compile" means "bring together" and compiler systems usually have lots of interpreting and translating and linking to make some kind of artifact file, to wit:
> taking a step back, a compiler is simply a program that reads a file and writes a file
But not necessarily! You don't even need that much! Just source and target codes.
Forth systems (which I enjoy) for example have a single global STATE variable to switch between _execute this word now_ and _compile a call to this word for later_ directly into memory (plus metadata on words that allow them to execute anyway, extending the compiler, but I digress). You could snapshot the memory of the Forth process with its built target program and reload that way, but the usual Forth way is to just store the source and recompile to memory when needed.
Traditional threaded Forths compile calls to a list of routine addresses for a virtual machine interpreter routine (load address then jump to code, not much more work than the processor already does). I prefer subroutine threading, though, where calls are bone fide CALL instructions and the inner interpreter is the processor itself, because it's easier to understand.
Nowadays even the processor translates the instructions you give it to its own microcode for interpreting. It's code all the way down.
(I'm still reading the article.)
- While the reduction of a compiler to a deterministic function is theoretically sound, modern engineering practice necessitates addressing "Compiler-as-a-Service" architectures (LSP, incrementalism), where persistent state management complicates the purported simplicity of debugging. Furthermore, the article overlooks the non-deterministic nature of JIT compilation and parallel builds, which fundamentally challenges the comparison to stateless CLI utilities like grep.
- > What is a compiler?
Might be worth skipping to the interesting parts that aren’t in textbooks
- I don't feel his overflow miscompilation example is a good one. A 64bit multiplication converted back to 32bit has the same overflow behavior as if the computation was in 32bit (assuming nobody depends on the overflow indication, which is rare). And in high level programming languages you typically can't tell the difference.
- The only way to understand compilers is to have already written one.
- Always interested in compiler testing, so I look forward to what he has to say on that.
- skimmed through the article and the structure just hints at being not written by a human
- The compiler part of a language is actually a piece of cake compared to designing a concurrent garbage collector.
- “Compiler Engineering in Practice” is a blog series intended to pass on wisdom that seemingly every seasoned compiler developer knows, but is not systematically written down in any textbook or online resource. Some (but not much) prior experience with compilers is needed.