WebAssembly semantics
The amazing thing about WebAssembly is not just that it’s a neat low-level programming language in your browser. It’s that WebAssembly is the most popular programming language ever (how many people have a browser, after all?) to have a complete, verified formal specification. Standard ML is notable for this amongst functional languages, but WebAssembly represents part of a new wave of PL-theory-driven formalization in the development of widely adopted programming languages. Similar efforts are also underway to formalize the underpinnings of Rust.
WebAssembly’s spec isn’t just because one of the browser vendors happened to have a crazy PL theorist on the team—the PL metalanguage is the means of communication for WebAssembly semantics. Recall from the first day that a common issue in programming languages is having a consistent way to describe how a language works. For most languages, there is only ever one major compiler/interpreter, so this isn’t as big an issue. But for web languages, every browser implements them differently! Just look at the insane matrix of Javascript with browser features. For Google/Microsoft/Apple/Mozilla, the PL metalanguage is critical for clearly and consistently specifying how WebAssembly works independent of the browser.
Moreover, WebAssembly is a provably type-safe language. That means that it has a type-checker (not common for assembly languages) and a corresponding proof of progress and preservation. WebAssembly has no undefined behavior and no stuck states. That WebAssembly is provably type-safe is also evident in its security-first design. Unlike x86, the call stack is inaccessible from the program (overwriting the stack or memory can’t change the instruction pointer). All accesses to memory are required to be bounds-checked.
Today, our goal is to formalize a subset of WebAssembly using the same theoretical framework of grammars and operational semantics that we used for the lambda calculus. We’ll see two new core concepts: mutable runtime structures that exist separate from the program, and non-local control flow. Before we dive in, a few caveats to mention. First, we will be looking at a moderately reduced subset of the full WebAssembly language, for example only considering i32
numbers (no other sizes of integers or any floats). Moreover, the semantics presented here are different from those in the formal spec for purposes of clarity and familiarity to what you’ve seen so far. Lastly, although the typing rules are an important part of WebAssembly, we will only look at the dynamic semantics in this class.
The value stack
Important caveat: each rule presented here will be correct in the context of the formalisms seen so far in the lecture. Go to the bottom of the page for a complete language specification that has the final form of each rule.
The first main idea is to understand how we formally represent the WebAssembly value stack. For now, we will consider a subset of WebAssembly where a program is just a sequence of instructions, no modules or functions. It has the following syntax:
To describe how a program executes, we are going to need a new idea of a runtime configuration. This data structure represents all of the runtime state of the program. Previously, the program was the only runtime state, but now we will introduce a value stack.
We will be using the convention to mean “a sequence of 0 or more numbers ” (similarly for ). The symbol will denote an empty sequence. This configuration then is a record consisting of a value stack and a sequence of instructions. For example, if our program is just , then we should be able to prove that:
We can write that semantic rule as:
This reads as: given some value stack , when the first instruction in our sequence is followed by some , then put that value on the stack and remove the instruction from the sequence. While this is maximally explicit, it’s also maximally verbose. By convention, we are going to write our rules not on the entire configuration, but just as changes to the configuration. For example, a simplified const rule:
Be super careful to understand the convention here, as we will heavily rely on it to avoid drowning in syntax. Because the instruction does not appear on the right side, it’s assumed to have been removed. Although the stack does not appear on the left side, the is being pushed onto whatever existing values are on the stack. As another example, here’s the rule for binary operators:
Using these rules, we can prove the correctness of the following trace:
Memory
While the value stack represents a separate data store from our program, the program can still only modify it indirectly through instructions like const and binop. By contrast, memory can be accessed/modified directly through the load and store insructions. In our simplified model of WebAssembly, you can think of memory as a sequence of 32-bit integers, initially of length zero. The memory can be grown in 16384-sized chunks, and will always fit a 32-bit address space (i.e. is also indexable by 32-bit integers). Specifically, we add four instructions:
We will also need to extend our configuration with memory:
To concisely write down semantics on integer sequences, we will use the syntax to represent the -th number in the sequence , and to represent the sequence with the -th number changed to . indicates the length of the sequence . For example:
This rule says that if the requested index is within the bounds of memory, then a load instruction from an index (gotten from the stack) pushes to the stack with the memory unchanged. The remaining semantics are as follows:
In the invalid rules, represents an error condition, discussed in greater detail below. Note that represents a sequence of 16384 zeroes.
Control flow
At this point, our semantics allow us to perform basic arithmetic operations and read/write memory. Now this brings us to the most complex part of WebAssembly semantics: control flow. Before we dive in, take a second to recall from last lecture how control flow works in WebAssembly. Using blocks and loops, we can create nested instruction sequences that can break to defined labels (jump points). For example, this C program:
void inf_incr() {
int x = 0;
while (true) {
x += 1;
if (x == 10) { break; }
}
}
Is equivalent to this WebAssembly program:
(func $inf_incr (local $x i32)
;; x = 0
(i32.const 0)
(set_local $x)
(block $incr_loop_break
(loop $incr_loop
;; x++
(get_local $x)
(i32.const 1)
(i32.add)
(set_local $x)
;; if (x == 10) { break; }
(get_local $x)
(i32.const 10)
(i32.eq)
(br_if $incr_loop_break);
(br $incr_loop))))
In our WebAssembly subset, we’re going to introduce the following instructions:
$$ \begin{alignat*}{2} \msf{Expression}~e ::= \qamp \ldots \\ \mid \qamp \wblock{\star{e}} \qqamp \text{label at end} \\ \mid \qamp \wloop{\star{e}} \qqamp \text{label at start} \\ \mid \qamp \wbr{i} \qqamp \text{branch} \\ \mid \qamp \wbrif{i} \qqamp \text{conditional branch} \\ \end{alignat*} $$Like in the full WebAssembly language, blocks and loops are the only constructs that permit nested instructions. Additionally, to simplify the semantics, we are going to adopt the nameless/anonymous branching semantics. Here, quantifies a number just like (just used to qualitatively indicate the presence of an index instead of an arbitrary number).
To formally encode the semantics of jumps, we need to add another language construct: labels, or jump/branch points. We will express labels as an “administrative” instruction, i.e. an expression that is part of the language, but not meant to be written by the user, only used by the runtime to encode the state of the interpreter.
$$ \begin{alignat*}{2} \msf{Expression}~e ::= \qamp \ldots \\ \mid \qamp \wlabel{\star{e_{\msf{cont}}}}{(\star{n};~\star{e_{\msf{body}}})} \qqamp \text{branch point} \end{alignat*} $$A label defines a local stack , which is a value stack only accessible in code executing inside the scope of the label. This comes with two sequences of instructions: which is the current set of instructions being executed, and which is the “continuation” of label, or the sequence of instructions that will be run if the label is jumped to. For example, a program with nested labels should look like:
We can encode this behavior in four new semantic rules:
The first two rules say that when the first instruction under a label is a break, then check what number is in the branch. If it’s greater than zero, we toss away the current label and continue breaking up the label stack. When we reach the label we’re breaking to (number equal zero), then we step into the continuation. When a label is executing any other instruction besides a break, the third rule says to step the inside of the label, but replacing the stack with the label’s custom private stack. The last rule says that if we execute all the instructions in a label without breaking, then we throw away the label.
Lastly, we define our translation of loops/blocks/conditional breaks into these core primitives:
Look carefully at the D-Loop rule. If you squint, it should look suspiciously like a fixpoint to you! When we encounter a loop, that steps to a label where the continuation is the loop itself, meaning jumping back to a label generated by a loop permits infinite recursion.
Call stack
The last key concept in the WebAssembly semantics is how we represent modules, functions, and function calls. As you saw last time, a WebAssembly module is a sequence of functions, where each function has a sequence of inputs, locals, and a single output. We add these to our syntax:
$$ \begin{alignat*}{2} \msf{Expression}~e ::= \qamp \ldots \\ \mid \qamp \wgetlocal{i} \qqamp \text{get frame local} \\ \mid \qamp \wsetlocal{i} \qqamp \text{set frame local} \\ \mid \qamp \wcall{i} \qqamp \text{function call} \\ \mid \qamp \wreturn \qqamp \text{function return} \\[1.2em] \msf{Function}~f ::= \qamp \wfunc{i}{j}{\star{e}} \\[1.2em] \msf{Module}~m ::= \qamp \wmodule{\star{f}} \end{alignat*} $$A module is a sequence of functions, and a function has some number of parameters and locals along with a function body, which is a sequence of instructions. Just like we have a persistent memory in our runtime configuration, we also need to have function-local values in our configuration as well. This leads to our full, final configuration:
$$ \begin{align*} \msf{Config}~c ::= \qamp \wconfig{m}{\star{n_{\msf{mem}}}}{\star{n_{\msf{locals}}}}{\star{n_{\msf{stack}}}}{\star{e}} \end{align*} $$At runtime, we have a module (set of functions), a memory (sequence of numbers), some locals (also a sequence of numbers), a value stack, and lastly the current instruction sequence. Lastly, in the same way we introduced a label construct to represent a stack of labels at runtime, we are also going to introduce two administrative instruction for dealing with functions:
$$ \begin{alignat*}{2} \msf{Expression}~e ::= \qamp \ldots \\ \mid \qamp \wframe{c} \qqamp \text{Function call frame} \\ \mid \qamp \wreturning{n} \qqamp \text{Active return} \end{alignat*} $$To create a function frame, we start with the intimidating semantics for the call instruction:
Effectively, this says: when we want to call a function , look up the corresponding function definition in the module, . Then create a new configuration with the same module and memory, but create a new set of locals: parameters gotten from the stack, and 0-values for local variables. (Note: we reverse the arguments, indicated by “rev”, from the stack, as the top element of the stack is the last argument of the function.) Start with an empty value stack in this new configuration, and a set of instructions taken from the function definition. Lastly, place this configuration into a call frame instruction.
We have two rules defining how to use a function call frame:
These rules say, respectively: when we’re calling a function, if the function body still has work to do (it can step), then step its configuration. Once the function has no more work to do, i.e. the instruction sequence is empty, then take the return value at the top of the inner configuration, and put it on the stack of the outer configuration. Also copy over any changes to memory (but not locals!).
Lastly, we have semantics for early returns:
We need separate “return” and “returning” instructions because the returning instruction has the return value packaged in the instruction, and it can hold onto it across many stacks that it might propagate across. We define two rules for returns that propagate across labels, and then stop at a function call boundary.
Difference from specification
At this point, we’ve covered enough semantics to implement all of the core of WebAssembly. That’s no small feat! However, in the interest of time and to focus on concepts over details, the semantics presented both exclude some features and present others in a different way than the standard specification. Features we excluded:
- A lot of the spec deals with numerics, e.g. bitwise operations, truncation, rounding, and the wonderful world of floating point. We just use 32-bit integers and a small subset of arithmetic/relational operators.
- WebAssembly has tables used for indirect function calls, which we omit.
- Modules have a lot more than just functions: imports, exports, multiple memories, global variables, data segments.
- WebAssembly has more conditional instructions including if, drop, and select.
- Loading/storing to memory gets complicated with various custom arguments and alignments.
- Labels (including blocks and loops) can actually have return values.
In terms of alternate presentation of the semantics, ours differs in two main ways.
- The value stack. The WebAssembly spec actually just uses the instruction sequence as the value stack. While this is maximally simple (fewer runtime structures needed), I think it’s a little confusing, and simpler from a pedagogical standpoint to have an explicit value stack.
- Labels. The spec uses this formalism of a block context to express the reduction of branches. This reduces the total ink in the spec, but I think is less clear than incremental branch reductions (breaking one label at a time). The semantics presented in this lecture were more inspired by the official WebAssembly interpreter (written in OCaml!).
Complete specification
Grammar
$$ \begin{align*} \msf{Expression}~e ::= \qamp \msf{Const}(n) \qqamp \wconst{n} \qqamp \text{constant} \\ \qamp \msf{Binop}(binop) \qqamp \wbinop{binop} \qqamp \text{binary operators} \\ \qamp \msf{Relop}(relop) \qqamp \msf{i32}.relop \qqamp \text{relational operators} \\ \qamp \msf{Load} \qqamp \wload \qqamp \text{load from memory} \\ \qamp \msf{Store} \qqamp \wstore \qqamp \text{store to memory} \\ \qamp \msf{Size} \qqamp \wsize \qqamp \text{get memory size} \\ \qamp \msf{Grow} \qqamp \wgrow \qqamp \text{increase memory size} \\ \qamp \msf{Unreachable} \qqamp \msf{unreachable} \qqamp \text{unreachable} \\ \qamp \msf{Block}(\star{e}) \qqamp \wblock{\star{e}} \qqamp \text{block} \\ \qamp \msf{Loop}(\star{e}) \qqamp \wloop{\star{e}} \qqamp \text{loop} \\ \qamp \msf{Br}(i) \qqamp \wbr{i} \qqamp \text{branch} \\ \qamp \msf{BrIf}(i) \qqamp \wbrif{i} \qqamp \text{conditional branch} \\ \qamp \msf{Call}(i) \qqamp \wcall{i} \qqamp \text{function call} \\ \qamp \msf{Return} \qqamp \wreturn \qqamp \text{function return} \\ \qamp \msf{GetLocal}(i) \qqamp \wgetlocal{i} \qqamp \text{get frame local} \\ \qamp \msf{SetLocal}(i) \qqamp \wsetlocal{i} \qqamp \text{set frame local} \\ \\[1.2em] \msf{Binop}~binop ::= \qamp \msf{Add} \qqamp \msf{add} \qqamp \text{addition} \\ \qamp \msf{Sub} \qqamp \msf{sub} \qqamp \text{subtraction} \\ \qamp \msf{Mul} \qqamp \msf{mul} \qqamp \text{multiplication} \\ \qamp \msf{Div} \qqamp \msf{div\_s} \qqamp \text{division} \\[1.2em] \msf{Relop}~relop ::= \qamp \msf{Eq} \qqamp \msf{eq} \qqamp \text{equals} \\ \qamp \msf{Lt} \qqamp \msf{lt\_s} \qqamp \text{less than} \\ \qamp \msf{Gt} \qqamp \msf{gt\_s} \qqamp \text{greater than} \\[1.2em] \msf{Function}~f ::= \qamp \wfunc{i}{j}{\star{e}} \\[1.2em] \msf{Module}~m ::= \qamp \wmodule{\star{f}} \end{align*} $$Configuration: $$ \begin{align*} \msf{Config}~c ::= \qamp \wconfig{m}{\star{n_{\msf{mem}}}}{\star{n_{\msf{locals}}}}{\star{n_{\msf{stack}}}}{\star{e}} \end{align*} $$
Administrative instructions:
$$ \begin{align*} \msf{Expression}~e ::= \qamp \ldots \\ \qamp \msf{Label}(\star{e_{\msf{cont}}}, \star{n}, \star{e_{\msf{body}}}) \qqamp \wlabel{\star{e_{\msf{cont}}}}{(\star{n};~\star{e_{\msf{body}}})} \qqamp \text{Branch point} \\ \qamp \msf{Frame}(c) \qqamp \wframe{c} \qqamp \text{Call point} \\ \qamp \msf{Returning}(n) \qqamp \wreturning{n} \qqamp \text{Active return} \\ \qamp \msf{Trapping} \qqamp \wtrapping \qqamp \text{Active trap} \end{align*} $$