Esc
Start typing to search...
Back to Blog

Why We Chose Rust to Build Keel (Honest Version)

2026-02-14 Keel Team 14 min read
rustinternalslanguage-designtooling

Why We Chose Rust to Build Keel (Honest Version)

There is a moment in every programming language project where you realize you have to pick a language to write the language in. It sounds like a zen riddle. It is not. It is a decision that will haunt you for years, because you will be living inside that implementation language every single day — debugging it at midnight, cursing its error messages, and slowly becoming the kind of person who has opinions about string interning strategies.

We picked Rust. This post is about why, what that choice gave us, and the parts where we quietly stared at the ceiling wondering if we had made a terrible mistake.

The Lineup (And Why We Said No)

Before writing a single line of Rust, there was a genuinely embarrassing period of indecision. Here is the short version.

C is what the serious people use. CPython, Lua, Ruby MRI — all C. The performance is unbeatable, the control is total, and the decades-long track record is undeniable. It also gives you manual memory management in a codebase that juggles syntax trees, closures, and a heap-allocated garbage collector. We know ourselves. We would have shipped segfaults.

C++ offers more abstraction. It also offers templates, multiple inheritance, and a build system that, if you are lucky, only takes an afternoon to set up. We were not feeling lucky.

Go compiles fast and has a wonderful standard library. But the entire Keel project is about representing data as tagged unions — AST nodes, instructions, type representations, error variants. Go does not have sum types. Every place where Keel uses a Rust enum, we would have been writing interface dispatch and type assertions. That is a lot of switch val := v.(type) for a compiler with 46 instruction variants.

Haskell would have been thematically satisfying — a functional language building a functional language. GHC is a marvel. But lazy evaluation makes performance profiling an adventure, and deploying Haskell binaries to end users involves a conversation about the GHC runtime that nobody wants to have.

OCaml was a serious contender. Fast, has algebraic types, great pattern matching — the original Rust compiler was written in it. The ecosystem was the sticking point. WebAssembly support (needed for the web playground) and the available library ecosystem tipped the scales away from it. No disrespect to OCaml; it just was not the right fit for this project at this time.

Rust has algebraic data types, exhaustive pattern matching, no garbage collector overhead, native WebAssembly compilation, a built-in test framework, and cargo. We knew the borrow checker would cost us time. We decided it was worth paying.

(It was. Mostly.)

Exhaustive Matching, All the Way Down

If you have read the pattern matching post, you know that Keel promises its users exhaustive pattern matching: if you forget a variant in a case expression, the compiler tells you. What you might not know is that Rust gives us the exact same guarantee while we build the compiler itself.

Keel's VM instructions are a Rust enum with 46 variants:

#[derive(Debug, Clone, PartialEq, Serialize)]
pub enum Instruction {
    MovRegVal(Register, RegisterValue),
    AddRegRegReg(Register, Register, Register),
    JumpIfFalse(usize, usize),
    Closure(Register, usize, usize, usize, usize),
    Call(Register, Vec<Register>),
    MakeList(Register, Vec<Register>),
    AllocRecord(Register, usize),
    MakeEnum(Register, usize, usize, Option<Register>),
    // ... 38 more variants
}

When we add a new instruction — say, ListTail for extracting the tail of a list in cons pattern matching — Rust forces us to handle it in every match expression across the codebase. The VM dispatch loop, the bytecode pretty printer, the time machine debugger — every place that touches instructions must account for the new variant, or the build fails. Not a warning. A hard stop.

There is something recursive about this that still delights us: the exhaustiveness checking we are building for Keel users is the same exhaustiveness checking that protects us from ourselves. The feature and the tool that builds it are the same idea at different levels.

The Error System That Grows Without Breaking

Keel currently has around 120 error variants spread across five error enums — lexer errors, parser errors, scope errors, type errors, and compile errors. Every single one is a typed Rust enum variant with structured data:

#[derive(Debug, Clone, PartialEq, Error)]
pub enum CompileError {
    #[error("Type mismatch: expected {expected}, found {found}")]
    TypeMismatch { expected: String, found: String },

    #[error("Non-exhaustive pattern match: missing variants: {}",
            missing_variants.join(", "))]
    NonExhaustiveMatch { missing_variants: Vec<String> },

    #[error("Lambda parameter '{param}' requires a type annotation")]
    LambdaParameterUntyped { param: String },

    // ... many more
}

Each error carries exactly the data it needs — not a formatted string, but typed fields. When we need to generate a hint ("Did you mean userName?"), the suggestion list is a Vec<String> that we can process, not a sentence we have to parse back out. When the type checker reports a mismatch, it hands us the expected and found types as structured data.

The thiserror crate generates the Display implementation from the #[error(...)] attribute, and the ariadne crate renders it all with colored source spans in the terminal. Both integrate through Cargo with a single line in Cargo.toml. That is not nothing — Keel's error reporting gets a surprising amount of positive feedback, and most of the credit goes to those two libraries.

The important part: every error variant implements an ErrorHint trait with hint() and note() methods. When we add a new error, the trait implementation forces us to decide what suggestion to offer the user. We cannot forget, because it will not compile. That is how a project with 120 error variants maintains the promise of helpful messages. Not through discipline. Through the type system.

Enums for Everything (Because a Compiler Is a Zoo)

We said this about instructions, but it applies everywhere. A programming language implementation is fundamentally a series of transformations between data representations — source text to tokens, tokens to AST, AST to bytecode, bytecode to values. Each representation is a tree of tagged variants. Rust enums are precisely this.

Keel's AST expression type has 26 variants:

#[derive(Clone, Debug, PartialEq, Serialize)]
pub enum Expr {
    Int(i64, Comments),
    Float(f64, Comments),
    String(String, Comments),
    Boolean(bool, Comments),
    Var(String, ScopeId, Comments),
    Binary { op: BinOp, left: Box<Spanned<Expr>>, right: Box<Spanned<Expr>>, comments: Comments },
    If { condition: Box<Spanned<Expr>>, then_branch: Box<Expr>, else_branch: Box<Expr>, ... },
    Case { expr: Box<Spanned<Expr>>, branches: Vec<Box<Spanned<CaseBranch>>>, ... },
    Lambda { params: Vec<Pattern>, body: Box<Expr>, scope_id: ScopeId, ... },
    FunctionCall { function: Box<Expr>, arguments: Box<Expr>, ... },
    Record(Vec<(String, Expr, Option<Spanned<Comment>>)>, Comments),
    EnumVariant { enum_name: String, variant_name: String, args: Vec<Spanned<Expr>>, ... },
    // ... 14 more variants
}

Each variant carries exactly the data it needs. No null fields. No optional fields that "should probably be set." No kind string that you parse at runtime and hope you typed correctly. When we add a new expression form to the parser, Rust tells us every function in the compiler that needs to handle it. It is like having a coworker who never forgets anything and is slightly annoying about it. We love that coworker.

In Go, we would use interfaces and type switches. In Python, classes and isinstance. In C, tagged unions with manual dispatch. None of those give us exhaustiveness checking. We would just... forget to handle a variant sometimes, discover it weeks later in a failing test, and quietly add the missing case. In Rust, that path does not exist.

Testing Without Ceremony

Rust's built-in test framework does not get enough credit. There is no test runner to install, no configuration file to maintain, no test discovery magic to debug. You write #[test], you run cargo test, you go make coffee.

Keel has a keel_test! macro that turns a single declaration into up to four tests — one for each compiler phase:

keel_test! {
    name: cons_pattern_sum_function,
    code: r#"
module TestMath exposing (sum)
    fn sum: List Int -> Int
    fn sum list =
        case list of
            x :: xs -> x + sum xs
            [] -> 0
TestMath.sum [1, 2, 3, 4, 5]
"#,
    expected: [OutputValue::Int(15)],
}

This generates cons_pattern_sum_function__lexer, __parser, __compile, and __output — four isolated tests from three lines of macro invocation. If the lexer regresses, we know it is a lexer problem. If parsing breaks, the parser test fails and the compile test is skipped. No ambiguity about which phase went wrong.

For tests that should fail, the macro accepts a fails_at: parameter:

keel_test! {
    name: type_mismatch_in_addition,
    code: r#"
fn add : Int -> Int -> Int
fn add x y = x + y
add "hello" 5
"#,
    fails_at: compiler,
}

This generates a lexer test (should pass), a parser test (should pass), and a compile test that asserts failure. The entire error testing story is: write three lines, get targeted phase-level assertions. The test suite has 16 feature test files covering everything from basic literals through pattern matching, modules, Maybe and Result types, and imports.

Benchmarking That Actually Helps

The benchmark story is similar. Keel uses Criterion for benchmarking, with five groups that isolate each phase of the pipeline:

fn bench_full_pipeline(c: &mut Criterion) {
    let mut group = c.benchmark_group("full_pipeline");
    for case in BENCH_CASES {
        group.throughput(Throughput::Bytes(case.code.len() as u64));
        group.bench_with_input(
            BenchmarkId::new(case.category, case.name),
            &case.code,
            |b, code| b.iter(|| VM::compile(black_box(code))),
        );
    }
    group.finish();
}

There are 61 benchmark cases across seven categories (literals, arithmetic, functions, pattern matching, data structures, control flow, end-to-end), and each one is benchmarked independently for lexing, parsing, compilation, execution, and the full pipeline. When we optimize something — say, the arena allocator in the parser — we can see exactly which phase improved and by how much.

cargo bench generates HTML reports with statistical analysis. No setup. No YAML. Just cargo bench and a few minutes of waiting.

Performance You Get for Free (Almost)

Keel's VM dispatches instructions, manipulates registers, allocates heap objects, and calls native standard library functions millions of times per second. This is the kind of code where performance matters — not because any single operation is expensive, but because they add up.

In Rust, we write straightforward code and get native performance. The string interner is a HashMap with a Vec backing store — O(1) lookup and O(1) index-to-string resolution:

pub struct StringInterner {
    map: std::collections::HashMap<String, usize>,
    vec: Vec<String>,
}

The parser's symbol arena stores all symbols in a contiguous Vec<Symbol> with 4-byte SymbolId references (using NonZeroU32 so that Option<SymbolId> is the same size). Compare that to a Vec<Symbol> reference at 24 bytes. The resulting cache locality makes parsing measurably faster, and we did not have to think very hard to get there — just arrange data contiguously and let the hardware do its thing.

When the web playground runs Keel code, it compiles to WebAssembly — the same Rust source, the same compiler, the same VM, running in the browser at near-native speed. One codebase, two targets, no rewrite. That still feels slightly magical.

Six Crates, One Command

The Keel workspace has six crates: keel-core (the language itself), keel-lsp (the language server), keel-repl (the interactive REPL), keel-web (the website with playground), keel-fmt (the formatter), and keel-tree-sitter (the grammar for editor highlighting). They share dependencies through workspace-level Cargo.toml, and cargo build builds exactly what is needed.

Feature flags handle optional functionality. The time machine debugger — a step-by-step replay system that captures state across all compiler phases — compiles only when you pass --features time-machine. The rest of the time, it is zero cost. Not "low cost." Zero.

If you have ever maintained a multi-project build with CMake, or juggled setuptools and pyproject.toml and tox and nox, you understand why we find Cargo moving. It is not perfect (the lockfile diffs are noisy, and workspace dependency resolution has quirks), but it solves the boring infrastructure problems so thoroughly that we almost forget they exist.

The Ecosystem Did the Heavy Lifting

Building a programming language means building a lexer, parser, compiler, VM, type checker, error reporter, string interner, garbage collector, standard library, REPL, language server, website, and documentation system. If we had built all of that from scratch, we would still be working on the lexer.

Here is what we did not have to build:

  • chumsky — Parser combinator library. Both the lexer and the parser are built with it. Error recovery for LSP diagnostics comes essentially for free.
  • ariadne — Terminal error reporting with colored source spans. This one crate is responsible for a disproportionate share of "nice error messages" feedback.
  • thiserror — Derive macros for error types. All 120 error variants get formatted messages without boilerplate.
  • serde — Serialization. The AST, tokens, and bytecode all serialize to JSON for debugging and the time machine.
  • criterion — Statistical benchmarking with HTML reports, as mentioned above.

Each of these integrated through a single line in Cargo.toml. In another ecosystem, that sentence would be a paragraph about compatibility matrices.

The Trade-Offs (The Honest Part)

We promised an honest version. Here it is.

Compile times are genuinely slow. A clean build of the workspace is a coffee break, not a sip. Incremental builds are fine for development, but CI runs feel every second. This is the cost of monomorphization and Rust's deep type checking. We have accepted it. We have not stopped muttering about it under our breath.

The learning curve is real. Lifetimes, borrowing, ownership — these are concepts you have to internalize, not just understand intellectually. The first month was rough. The borrow checker is not a suggestion engine; it is a wall. By month six, things clicked. Two years in, we cannot imagine going back. But we will not pretend it was painless getting here.

Some things require more ceremony than feels necessary. Cloning values to satisfy the borrow checker when you know it is fine. String handling that requires more .to_string() and &str negotiation than a garbage-collected language. Error propagation with Result is explicit — which is a feature right up until you are five levels deep and your code is 30% question marks.

Async Rust is a different language. We did not need much of it for Keel core, but the LSP server touches async boundaries, and every time it does, we are reminded that async Rust has its own learning curve stacked on top of the regular one.

These are real costs. For this project — a compiler, a VM, a language server, a web playground — the benefits outweigh them. For a CRUD web app, we would probably reach for something else. Context matters.

Two Unsafe Blocks

In the entire keel-core crate, there are exactly two unsafe blocks. Both are in the IO standard library module, both are for environment variable access (std::env::set_var), and both exist because Rust's standard library correctly marks concurrent environment variable mutation as unsafe.

We mention this not to brag (two is not zero, after all) but because it says something about what Rust makes possible. A compiler, a VM with a register file and garbage-collected heap, a string interner, a symbol arena — all of it runs in safe Rust. The type system, the borrow checker, and the ownership model together eliminate the classes of memory bugs that haunt systems code in other languages. When your entire project is about providing safety guarantees to users, building on a foundation that provides the same guarantees to you is not just convenient. It is honest.

Would We Do It Again?

Yes. Without much hesitation.

Not because Rust is the best language for everything — it is not, and pretending otherwise would be exactly the kind of preachy nonsense we try to avoid. But for this specific project — a strongly-typed functional language with a register-based VM, a type checker that handles inference and exhaustiveness, and a web playground that runs the same code in the browser — Rust fits like it was designed for the job. Algebraic data types for representing every intermediate form. Exhaustive matching for never forgetting a variant. Native performance for a VM that dispatches millions of instructions. WebAssembly for the playground. Cargo for the workspace.

Keel promises its users compile-time safety: strong types, exhaustive pattern matching, Maybe and Result instead of null. Building that promise in a language that provides the same guarantees is not a coincidence. The tools reinforce each other. The ideas rhyme.

Two years, six crates, 46 instructions, 120 error variants, two unsafe blocks, and we still get a small thrill when cargo test passes clean. Even when it takes a while to compile first.