
How to Train Your Compiler Dragon: An Introduction to LLVM
Why LLVM? Meet Your Compiler Dragon
So, you want to build your own programming language. Ambitious, dangerous, slightly unhinged, but in the best way. The challenge is this: turning your shiny new syntax into something the machine can actually run.
You could write your own backend from scratch, hand-crafting assembly like a medieval scribe copying spells by candlelight. Or… you could recruit a dragon.
That dragon is LLVM. It’s big, it’s powerful, and it already knows how to breathe fire in every dialect a CPU understands. Your job isn’t to reinvent fire, it’s to learn how to ride it.
This article will give you a tour of LLVM’s core concepts: contexts, modules, builders, engines, and the mystical LLVM IR, showing how they all fit together when you’re creating your own language. Think of it less as a tutorial and more as a map for beginners: a friendly guide to the landscape of compiler dragons.
LLVM is the dragon you didn’t have to hatch yourself. It’s already big, strong, and knows how to breathe fire in every dialect you care about: x86, ARM, WebAssembly, RISC-V… You name it. Instead of teaching every detail of every architecture, you hand LLVM a high-level description of your program (called LLVM IR), and it takes care of the roaring.
Why should you care? Because LLVM brings you:
- Portability – Target one IR, and your code can fly across different platforms without rewriting backends.
- Optimizations – LLVM knows secret dragon tricks: strength reduction, inlining, loop unrolling, and all the things you’d rather not implement by hand.
- Battle-tested tools – Rust, Swift, Julia, Zig, and Clang all ride the same beast. If it’s good enough for them, it’s good enough for your fledgling language.
Think of the compiler pipeline like a quest map:
Your Code → Lexer/Parser → AST → LLVM IR → Optimization → Machine Code
Everything before LLVM is your story to tell: your syntax, your semantics, your choices. Everything after is LLVM’s domain. That’s the beauty of it: you get to focus on inventing the language you want, while LLVM makes sure it runs like a champ on actual hardware.
By the time we’re done here, LLVM won’t feel like a scary monster hiding in the backend. It’ll feel more like a loyal, fire-breathing companion. The kind of dragon you can call on when you need your language to come alive.
1. The Dragon’s Anatomy: Core Pieces of LLVM
Every mighty dragon has organs and bones that make it tick. LLVM is no different. If you’re going to ride this beast into battle (or at least into “Hello, World”), you need to know what its major parts are and how they fit together.
Let’s break down the anatomy:
Contexts: The Dragon’s World
A context is like the dimension your dragon lives in. It holds all the unique types, constants, and metadata that your compiler is going to create. Think of it as the dragon’s personal universe, you don’t usually need multiple universes unless you’re going multiverse-hopping, but LLVM lets you have more than one if you like.
#include "llvm/IR/LLVMContext.h"
llvm::Context* _context;
In practice, almost everything starts with a context. No context, no dragon.
Modules: The Spellbook
If a context is the world, then a module is the dragon’s spellbook. Each module contains functions, global variables, and all the definitions that make up your program. It’s where your AST’s grand ideas (like “I want a function called add()”) finally get written down as LLVM IR.
#include "llvm/IR/Module.h"
llvm::Module* _module;
Modules can be saved to disk, shipped off to other tools, or linked together. One module might represent a single source file, or your entire program, depending on how you want to organize things.
Builders: The Dragon’s Hands
How do you actually write the IR inside a module? That’s where the builder comes in. A builder is your dragon’s hand holding a quill, scribbling out LLVM instructions one at a time.
#include "llvm/IR/IRBuilder.h"
llvm::IRBuilder<>* _builder;
You tell the builder: “Place an addition instruction here, in this block,” and it obediently writes it down. Builders are picky about location — you have to tell them which function and which basic block you’re working in, or else you’ll end up with instructions floating in the void (a rookie dragon-rider mistake).
Engines: The Fire in Its Belly
Finally, once you’ve built your module full of IR, how do you actually make it run? That’s where the execution engine comes in. This is the dragon’s fire, turning your carefully-written IR into machine code that roars on real hardware.
Putting It Together
So the anatomy looks something like this:
- Context: the dragon’s world
- Module: the spellbook
- Builder: the hands writing spells
- Engine: the fire that brings it all to life
With just these pieces, you can already imagine how your source code transforms into something a CPU can execute. Each part plays a role — and once you know them, the LLVM dragon starts looking less like a monster and more like a trusty companion.
2. Dragon Speak: LLVM IR in Plain English
Every dragon has its language: strange, powerful, and a little scary the first time you hear it. For LLVM, that language is LLVM IR (Intermediate Representation).
LLVM IR sits between your shiny high-level syntax and the gritty assembly the machine actually understands. It’s human-readable enough that you can work with it, but low-level enough that LLVM can turn it into efficient machine code.
Think of LLVM IR as the dragon’s roar: not pretty, but it gets the job done.
A First Roar: “Hello World” in LLVM IR
Here’s a tiny snippet of IR that defines a function adding two integers:
define i32 @add(i32 %a, i32 %b) {
entry:
%sum = add i32 %a, %b
ret i32 %sum
}
Let’s break this down:
define i32 @add(...)
– Define a function namedadd
that returns a 32-bit integer (i32
).%a
,%b
– The dragon’s placeholders (variables) for the function arguments.%sum = add i32 %a, %b
– An instruction that literally says “add two 32-bit integers and call the result%sum
.”ret i32 %sum
– Return the result.
That’s LLVM IR in action: straightforward, verbose, and explicit. No magic.
Mapping High-Level Code to IR
Let’s see how your code in a toy language might turn into IR.
Your source code:
fn add(a: int, b: int) int {
return a + b;
}
LLVM’s roar:
define i32 @add(i32 %a, i32 %b) {
entry:
%sum = add i32 %a, %b
ret i32 %sum
}
It’s like translating from English to Dragon:
- “Function named add” →
define ... @add
- “Takes two integers” →
i32 %a, i32 %b
- “Return a + b” →
add
instruction followed byret
.
Why IR Matters
LLVM IR is the sweet spot between high-level ideas and low-level execution:
- Readable enough for humans to debug.
- Flexible enough to optimize.
- Portable enough to compile down to any architecture LLVM supports.
Think of it as the dragon’s “common tongue.” You speak your fancy custom syntax, LLVM translates it into IR, and from there it can be spoken fluently in x86, ARM, WebAssembly, or whatever battlefield you’re headed for.
To put it visually, the process looks like this:
Your Code → LLVM IR → Machine Code
(fn add) (define) (mov, add, ret)
IR is the middle step, the dragon’s roar echoing between your high-level spell and the CPU’s cold, unfeeling assembly.
3. Bringing It to Life: From IR to Execution
So far, you’ve seen LLVM IR as the dragon’s language: structured, strange, but still just words on a page. Now comes the fun part: making the dragon actually roar.
That’s the job of LLVM’s execution engines.
The Two Ways a Dragon Breathes Fire
LLVM can run your code in two main ways:
- JIT (Just-In-Time) Compilation
- Like a dragon breathing fire instantly when commanded.
- Your IR is turned into machine code on the spot and executed right away.
- Great for REPLs, scripting languages, and anything interactive.
- AOT (Ahead-Of-Time) Compilation
- More like forging dragonfire into a blade.
- Your IR is compiled into a binary you can run later, no LLVM required.
- Great for shipping executables, libraries, or production builds.
A Tiny Example: Making IR Roar
Remember our add
function from earlier? On its own, IR is just an incantation. To hear it roar, you hand it to an execution engine.
LLVM IR (unchanged):
define i32 @add(i32 %a, i32 %b) {
entry:
%sum = add i32 %a, %b
ret i32 %sum
}
With JIT:
- You load this IR into an engine.
- The engine translates it into the CPU’s native instructions (say, x86).
- You call
add(2, 3)
. - The CPU dutifully returns
5
.
Behind the scenes, LLVM has just taken your dragon’s roar and projected it into the real world, one flame-burst at a time.
Why This Matters
When you’re designing your own language, this is the point where ideas stop being theoretical. Your parser and AST define the shape of your language. LLVM IR captures its essence. But the execution engine is what lets you show off:
- Your toy language can now evaluate code on the fly.
- You can test functions without building a full binary.
- You can imagine a REPL prompt where your dragon roars back instant answers.
Execution engines are the fire in the dragon’s belly. Without them, you’ve got a pet lizard scribbling runes. With them, you’ve got a fire-breathing beast that makes your language come alive.
5. Your First Spell: Building Blocks of a Language
Now that you know how to wake the dragon and hear its roar, it’s time to teach it a few tricks. Every language, no matter how grand or humble, rests on the same foundations: variables, functions, and control flow. These are your first spells, the bread and butter of programming magic.
And the best part? LLVM already knows how to handle all of them.
Variables: Giving Names to Sparks
In your source language, a variable might look innocent:
let x = 42;
But LLVM doesn’t deal in “let” or “const.” For the dragon, this is about allocating memory, storing a value, and loading it later.
In IR, a simple variable might unfold like this:
%x = alloca i32 ; allocate space for a 32-bit int
store i32 42, i32* %x ; store 42 in that space
%val = load i32, i32* %x ; read it back into a register
It’s wordier, yes, but explicit: allocate, store, load. LLVM likes everything spelled out.
Functions: Spells That Can Be Cast Again
Functions are the heart of any language… reusable magic. You’ve already seen our add
function:
define i32 @add(i32 %a, i32 %b) {
entry:
%sum = add i32 %a, %b
ret i32 %sum
}
When you build a function with LLVM, you’re really filling in a module’s spellbook:
- Declare the return type and arguments.
- Create a block (entry:).
- Place instructions with the builder.
- Return the final value.
From the outside, it feels like a neat little
fn add(a, b) { return a + b; }
. Inside, it’s the dragon carefully scripting every motion.
Control Flow: Teaching the Dragon to Choose
Of course, a language isn’t just straight-line code. You need decisions, loops and ways to control the flow of fire.
Take a simple if
statement:
if x > 0 {
print("positive");
} else {
print("negative or zero");
}
LLVM doesn’t have if
baked in. Instead, it builds basic blocks and branches between them. Conceptually, it looks like this:
[entry]
|
compare x > 0
/ \
[then] [else]
\ /
[merge]
In IR, this becomes:
; compare x > 0
%cmp = icmp sgt i32 %x, 0
br i1 %cmp, label %then, label %else
then:
; ... do positive stuff ...
br label %merge
else:
; ... do negative stuff ...
br label %merge
merge:
; continue here
It’s a bit more work, but this block-based structure is the dragon’s way of reasoning about choices. Loops, switches, and other control flow all use the same recipe: compare, branch, merge.
The Bigger Picture
So far, you’ve taught your dragon:
- Variables → allocate, store, load
- Functions → define, compute, return
- Control flow → compare, branch, merge
This is the essential toolkit of any language. With just these spells, you can build a working, if humble, toy language. And more importantly, you can see how LLVM’s context, modules, builders, and engines all come together to make those spells real.
6. Next Quests for the Adventurer
You’ve met the dragon, studied its anatomy, learned its language, and even cast your first spells. That’s enough to prove you can ride, but the journey doesn’t stop here. LLVM has whole mountain ranges left to explore.
Here are a few quests you might embark on next:
Sharpen Your Tools
- LLVM comes with command-line helpers like lli (to run IR directly) and opt (to apply optimization passes). They’re like dragon-smiths’ hammers — great for testing and refining your spells.
Learn Advanced Magic
- Once you’re comfortable, peek into SSA form and PHI nodes. These are how LLVM keeps track of values across different branches of code — the dragon’s way of remembering where each spark came from.
Bring in Allies
- LLVM lets you link external libraries or even other IR modules. Want your language to call into C’s printf or math functions? Totally possible. Think of it as teaching your dragon to fight alongside knights and wizards.
Dream Bigger
- Today you built functions and variables. Tomorrow, you might add arrays, objects, modules, or even coroutines. LLVM is flexible enough to support nearly anything you dream up.
Join the Guild
- Languages like Rust, Swift, Zig, and Julia all started their LLVM journeys once, just like you. Reading their storie or even peeking at their source code, is like sitting by the fire with veteran dragon riders.
Closing Words
LLVM may look intimidating at first, but now you know better. It’s not a monster to fear, it’s a powerful companion waiting to be guided. With contexts, modules, builders, and execution engines at your side, you can shape your own language and set it loose in the world.
So go ahead: train your compiler dragon. Just remember to duck when it breathes fire.
If you’d like a path to start creating your own language with LLVM, check out these articles:
- TBD