CS 242: Mutability

To this point, we have discussed a particular model of computation, the many variants of lambda calculus. Lambda calculus is useful as a computational model for its simplicity: it has very few moving parts, no separate runtime stack/heap, and so can be formalized without significant complexity. However, simplicity in the language doesn’t mean simplicity in usage, just simplicity in analysis.

Moreover, mathematics isn’t what makes programming languages run in the real world—physics does. By the laws of electricity and subsequently analog and digital circuit design, there are certain kinds of computations that are easier than others to represent with a physical computing apparatus. This is why assembly (e.g. x86, ARM, etc.) looks so different from lambda calculus, because assembly was designed with hardware in mind, while lambda calculus was designed with mathematics in mind. Assembly is a computer-oriented language, while lambda calculus is a computation-oriented language.

This means that bringing our PL toolkit to assembly languages, or more broadly low-level languages, will require us to formally model systems that look closer to how a modern computer actually works. At this point in the course, we’re shifting gears to look at how we can apply the ideas you learned in the last four weeks to systems programming. We’ll do that in two ways: first, you’ll learn about WebAssembly and how we can model typed assembly languages. Then you’ll learn (at a higher/less-formal level) about Rust, and how it uses novel type theories and functional programming patterns to improve correctness in systems code.

WebAssembly

WebAssembly (WASM) is a new assembly programming language intended to execute high-performance code in the browser. WASM is used to power games, design tools, serverless apps, and lightweight containers. For this class, WASM is notable because its language design is formally specified using the same metalanguage we’ve covered (inference rules, typing judgments, operational semantics). You can even find a formal proof of progress and preservation for WASM. This fact makes WASM the most popular language in use today with a formal spec!

The origin and motivation of WebAssembly is described well in the introduction of Bringing the Web up to Speed with WebAssembly, a PLDI’17 paper jointly authored by employees at Google, Microsoft, Mozilla, and Apple that laid out the design principles for the language. I will reproduce a section of it here:

The Web began as a simple document exchange network but has now become the most ubiquitous application platform ever, accessible across a vast array of operating systems and device types. By historical accident, JavaScript is the only natively supported programming language on the Web, its widespread usage unmatched by other technologies available only via plugins like ActiveX, Java or Flash. Because of JavaScript’s ubiquity, rapid performance improvements in modern VMs, and perhaps through sheer necessity, it has become a compilation target for other languages. Through Emscripten [43], even C and C++ programs can be compiled to a stylized low-level subset of JavaScript called asm.js [4]. Yet JavaScript has inconsistent performance and a number of other pitfalls, especially as a compilation target. WebAssembly addresses the problem of safe, fast, portable low-level code on the Web. Previous attempts at solving it, from ActiveX to Native Client to asm.js, have fallen short of properties that a low-level compilation target should have:

Safe, fast, and portable semantics:

safe to execute

fast to execute

language-, hardware-, and platform-independent

deterministic and easy to reason about

simple interoperability with the Web platform

Safe and efficient representation:

compact and easy to decode

easy to validate and compile

easy to generate for producers

streamable and parallelizable

The history here is quite fascinating. Essentially, as websites became increasingly complex (moving from web “sites” to web “apps”), browsers needed faster and faster execution for Javascript. The Javascript V8 engine is probably the most optimized scripting language runtime in history, considering Google has poured millions of people-hours into tweaking every last corner of Javascript.

Still, scripting languages are fundamentally slow for a number of reasons (dynamic type checks, garbage collections, unpredictable interactions with the JIT compiler, …), so in 2011-12 a group of devs at Mozilla carved out a subset of Javascript called asm.js that could be statically typed and compiled/executed similar to C/C++. This led to the development of Emscripten, the world’s premier C++ to Javascript compiler. In 2013, Mozilla partnered with Epic to compile the Unreal Engine into the asm.js-subset of Javascript. Sadly, the original demo seems to have been lost to time, but you can still find other demos from around then. However, realizing the inanity of these solutions, the browser vendors managed to join forces and collaboratively design a “portable low-level bytecode”, WebAssembly.

In designing a language together, the browser vendors wanted to build WASM such that it could be implemented the same way in every browser. This required a language specification that was precise, so there would be no disagreement on how WASM works. For most languages, there is only ever one major compiler/interpreter, so this isn’t as big an issue. But for web languages, every browser implements them differently! Just look at the insane matrix of Javascript with browser features.

Stack machines

We will use WebAssembly as a motivating example for how to formally specify two key kinds of language features:

Mutability: mutable local variables, mutable persistent memory
Control flow: stack frames, function returns, loop breaks

Like for lambda calculus, we will iteratively build the language’s specification by looking at the syntax, static semantics, and dynamic semantics in turn. And where we could execute the lambda calculus with OCaml, now we can execute our formal WASM with a browser’s WASM compiler.

Example

At a high-level, WASM embodies a stack-machine model of computation. The state of the program is modeled as a stack of values, and the program can read/push/pop the stack. For example, here is the concrete syntax of a WebAssembly function that computes the quantity $f(x) = 2x^2 + 1$ .

(func $f (param $x i32) (result i32)
                 ;; stack: []
  (get_local $x) ;; stack: [x]
  (get_local $x) ;; stack: [x, x]
  (i32.mul)      ;; stack: [x*x]

  (i32.const 2)  ;; stack: [2, x*x]
  (i32.mul)      ;; stack: [2*x*x]

  (i32.const 1)  ;; stack: [1, 2*x*x]
  (i32.add))     ;; stack: [2*x*x+1]

A number of things to note about our first WebAssembly program. First, the syntax: this is the WebAssembly text format, which is derived from S-expressions, the same format commonly used for the Lisp programming language and its many flavors. An S-expression is either an “atom” (e.g. get_local, 2, or $x), or a list of S-expressions surrounded by parentheses (e.g. (get_local $x), (param $x i32)). In composite expressions, usually the first element is an atom that indicates the kind of thing being represented, and the remaining elements are parameters of that thing. For example, get_local and i32.mul are S-expressions representing instructions.

Most instructions in WebAssembly modify the value stack in some way. In the function above, get_local pushes the parameter x onto the stack. i32.mul pops two values from the stack, multiplies them, and pushes the result back onto the stack. i32.const N pushes the value N onto the stack. The function implicitly returns the value at the top of the stack. Notationally, when writing the stack left to right, the rightmost value is the top of the stack. For operations that take multiple arguments, they are in reverse order as taken off the stack. For example:

(i32.const 1)
(i32.const 2)
(i32.sub)

This program computes $1 - 2$ . Note that while we’re consistently using i32, WebAssembly understands numeric types of different sizes, both integers (i8, i16, i32, i64, i128) and floats (f32, f64).

The assemblies you’ve seen before are likely register-based, i.e. they use a (probably small, fixed) set of registers which represent independent storage locations to hold data. For example, the above program in x86 would be:

f:
	movl	%edi, -4(%rbp) ; *(rbp-4) = x
	shll	$1, %edi       ; edi = edi << 1 (= edi * 2)
	imull	-4(%rbp), %edi ; edi = edi * *(rbp-4)
	addl	$1, %edi       ; edi = edi + 1
	movl	%edi, %eax     ; eax = edi
	retq                   ; return eax

Of course, x86 still has a stack (e.g. -4(%rbp)), but it uses that in tandem with registers.

Dynamic semantics

Now we will ask: how can we formalize the semantics of this stack behavior? Think back to our operational semantics for lambda calculus. One initially counterintuitive idea is that we can define an interpreter for lambda calculus without a runtime, meaning without any auxiliary structures besides the program. The program is the runtime. By contrast, on assignment 1, we defined an operational semantics for JSON objects and accessors where the runtime state was a pair of accessor/object $(a, j)$ .

Similarly for WASM, we need a more complex notion of a runtime state, which we will call a configuration. Like any other assembly language, our program is not defined by a single expression, but a sequence of expressions that modify state. We codify these ideas into a grammar:

$\begin{alignat*}{2} \msf{Expression}~e ::= \qamp \wconst{n} \qqamp \text{constant} \\ \mid \qamp \wbinop{\oplus} \qqamp \text{binary operators} \\[1.2em] \msf{Binop}~\oplus ::= \qamp \msf{add} \qqamp \text{addition} \\ \mid \qamp \msf{sub} \qqamp \text{subtraction} \\ \mid \qamp \msf{mul} \qqamp \text{multiplication} \\ \mid \qamp \msf{div\_s} \qqamp \text{Signed division} \\[1.2em] \msf{Config}~C ::= \qamp \{\wcs{:}~\star{n};~\wci{:}~\star{e}\} \end{alignat*}$

Like on assignment 1, $\star{n}$ means “a sequence of 0 or more numbers $n$ ” where $\eps$ is the empty sequence. We assume that our metalanguage now has a theory of records (like in OCaml), which we can use, e.g. $c.\wcs$ accesses the value stack on a configuration $c$ .

For example, consider the program $e^* = [\wconst{3}]$ . Assuming the initial state of the program has an empty value stack, then the semantics of executing this program should be:

$\steps{\{\wcs{:}~\eps;~\wci{:}~\wconst{3}\}}{\{\wcs{:}~3;~\wci{:}~\eps\}}$

This says: execute the constant instruction by pushing 3 onto the stack, and removing the instruction. This example is generalized into the following rule:

$\ir{D-Const} {} {\steps {\{\wcs{:}~\star{n};~\wci{:}~(\wconst{n'}),\star{e}\}} {\{\wcs{:}~\star{n},n';~\wci{:}~\star{e}\}}}$

This reads as: given some value stack $\star{n}$ , when the first instruction in our sequence is $\wconst{n'}$ followed by some $\star{e}$ , then put that value on the stack and remove the instruction from the sequence. While this is maximally explicit, it’s also maximally verbose. By convention, we are going to write our rules not on the entire configuration, but just as changes to the configuration. For example, a simplified const rule:

$\ir{D-Const} {} {\wsteps {\wci{:}~(\wconst{n})} {\wcs{:}~n}}$

Be super careful to understand the convention here, as we will heavily rely on it to avoid drowning in syntax. Because the $\wconst{n}$ instruction does not appear on the right side, it’s assumed to have been removed. Although the stack does not appear on the left side, the $n$ is being pushed onto whatever existing values are on the stack. As another example, here’s the rule for binary operators:

$\ir{D-Binop} {n' = n_L~\oplus~n_R} {\wsteps {\wcs{:}~n_L,n_R; ~\wci{:}~\wbinop{\oplus}} {\wcs{:}~n'}}$

Using these rules, we can prove the correctness of the following trace:

$\begin{align*} &\{\wcs{:}~\eps;~\wci{:}~\wconst{3},~\wconst{5},~\wbinop{\msf{sub}}\} \\ \mapsto\ &\{\wcs{:}~3;~\wci{:}~\wconst{5},~\wbinop{\msf{sub}}\} \tag*{(D-Const)}\\ \mapsto\ &\{\wcs{:}~3, 5;~\wci{:}~\wbinop{\msf{sub}}\} \tag*{(D-Const)}\\ \mapsto\ &\{\wcs{:}~{-2};~\wci{:}~\eps\} \tag*{(D-Binop)} \end{align*}$

And we can define a termination condition for a WASM program as executing the instruction stream to an empty list.

$\ir{D-Val-Done} {} {\val{\{\wci{:}~\eps\}}}$

Addressable memory

A stack is useful for representing ephemeral storage, a place for temporary values, the short-term memory of the computer. Addressable memory is the prevailing storage paradigm for long-term memory, values that persist across scopes and function calls. Just as in C, addressable memory is a giant array of bytes that you can arbitrarily read and write any location by its index.

Example

WebAssembly has two instructions $\wload$ and $\wstore$ for reading and writing memory. For example, consider C program:

int f() {
  int* x = (int*) 1234;
  *x = 1;
  return *x;
}

This program is equivalent to the WebAssembly program:

(func $f (result i32)
                   ;; stack: []
  (i32.const 1234) ;; stack: [1234]
  (i32.const 1)    ;; stack: [1, 1234]
  (i32.store)      ;; stack: []

  (i32.const 1234) ;; stack: [1234]
  (i32.load))      ;; stack: [1]

Unlike the value stack, WASM programs can directly index and manipulate the addressable memory. In our simplified model of WASM, you can think of memory as a sequence of 32-bit integers. Unlike a C program where we treat memory as fixed size, WASM semantics use a growable memory that initially is size 0. The memory can be grown in 16384-sized chunks, and will always fit a 32-bit address space (i.e. is also indexable by 32-bit integers).

Dynamic semantics

Encompassing these ideas, we have four memory-related instructions:

$\begin{alignat*}{2} \msf{Expression}~e ::= \qamp \ldots \\ \mid \qamp \wload \qqamp \text{load from memory} \\ \mid \qamp \wstore \qqamp \text{store to memory} \\ \mid \qamp \wsize \qqamp \text{get memory size} \\ \mid \qamp \wgrow \qqamp \text{increase memory size} \end{alignat*}$

We will also need to extend our configuration with memory:

$\newcommand{\nmem}{\star{n_\msf{mem}}} \begin{align*} \msf{Config}~C ::= \{\msf{mem}~\nmem;~\wcs{:}~\star{n_\wcs};~\wci{:}~\star{e}\} \end{align*}$

To concisely write down semantics on integer sequences, we will use the syntax $m[i]$ to represent the $i$ -th number in the sequence $m$ , and $(m~\msf{with}~i=n)$ to represent the sequence $m$ with the $i$ -th number changed to $n$ . $\mag{m}$ indicates the length of the sequence $m$ . For example:

$\ir{D-Load-Valid} {0 \leq i < |\nmem|} {\wsteps {\wcm{:}~\nmem;~\wcs{:}~i;~\wci{:}~\wload} {\wcm{:}~\nmem;~\wcs{:}~\nmem[i]}}$

This rule says that if the requested index $i$ is within the bounds of memory, then a load instruction from an index $i$ (gotten from the stack) pushes $m[i]$ to the stack with the memory unchanged. The remaining semantics are as follows:

$\ir{D-Load-Invalid} {i \geq |\nmem|} {\wsteps {\wcm{:}~\nmem;~\wcs{:}~i;~\wci{:}~\wload} {\wcm{:}~\nmem;~\wci{:}~\wtrapping}} \nl \ir{D-Store-Valid} {0 \leq i < |\nmem|} {\wsteps {\wcm{:}~\nmem;~\wcs{:}~n,i;~\wci{:}~\wstore} {\wcm{:}~\nmem~\with~i=n}} \nl \ir{D-Store-Invalid} {i \geq |\nmem|} {\wsteps {\wcm{:}~\nmem;~\wcs{:}~n,i;~\wci{:}~\wstore} {\wcm{:}~\nmem;\wci{:}~\wtrapping}} \nl \ir{D-Size} {} {\wsteps {\wcm{:}~\nmem;~\wci{:}~\wsize} {\wcm{:}~\nmem;~\wcs{:}~|\nmem|}} \nl \ir{D-Grow} {} {\wsteps {\wcm{:}~\nmem;~\wci{:}~\wgrow} {\wcm{:}~\nmem,0^{16384}}}$

Note that $0^{16384}$ represents a sequence of 16384 zeroes.

In the invalid rules, $\msf{trapping}$ represents an error condition. If a program raises an error (a trap), then we should immediately stop execution, which we can encode by saying that a trap is a value.

$\ir{D-Trap} {} {\val{\{\wci{:}~\wtrapping\}}}$

Static semantics

We’ve now formalized a WASM subset containing arithmetic and memory. But how do we know our specification is correct? Just as with the lambda calculus, we would like to at least prove the absence of undefined behavior. For example, in the program:

(i32.const 1)
(i32.add)

The add operator only has one argument on the stack when it is called, so by our operational semantics, this program would enter a stuck state. This issue raises the tricky question: is it even possible to avoid undefined behavior in assembly? If you look at x86 or ARM, it’s part of life that a misplaced instruction or logical error can cause undefined behavior. You could read a junk byte from the wrong place in memory, or overwrite the return pointer in a C function call, or so on. And there is no logical system that, for an arbitrary assembly program, can ensure the absence of these behaviors.

However, it turns out that a few careful design decisions will permit us to create a language that is simultaneously low-level and type-checkable. For example, most assemblies permit indreict branches to arbitrary instructions at any time, which makes it impossible to statically analyze the control flow of the program. By only having structured control flow (next lecture), we can eliminate this class of errors.

In the meantime, let’s explore the basics of how we formulate a typing discipline for WASM. The main idea is to treat an instruction as a function that inputs from and outputs to the value stack. For example, i32.const takes zero inputs and produces one output, an i32. The a binary operator like i32.add takes two inputs and produces one output. We’ll write this as:

$\begin{alignat*}{2} \msf{Type}~\tau ::= \qamp \mathsf{i32} \\[1.2em] \msf{InstrType}~\rho ::= \qamp \tau_1^* \rightarrow \tau_2^* \end{alignat*}$

$\boxed{\typeJC{e^*}{\rho}}$

$\ir{T-Const} {} {\typeJC{\wconst{n}}{\tfun{\eps}{\msf{i32}}}} \s \ir{T-Binop} {} {\typeJC{\wbinop{\oplus}}{\tfun{\msf{i32}, \msf{i32}}{\msf{i32}}}} \nl \ir{T-Load} {} {\typeJC{\wload}{\tfun{\msf{i32}}{\msf{i32}}}} \s \ir{T-Store} {} {\typeJC{\wstore}{\tfun{\msf{i32}, \msf{i32}}{\eps}}} \nl \ir{T-Trap} {} {\typeJC{\wtrapping}{\tfun{\tau_1^*}{\tau_2^*}}}$

A common convention in PL theory notation is to put the “shape” of the judgment you’re proving in a box. For example, this judgment says that the typing rule for WASM is defined over sequences of instructions on the left, and function types on the right. See Figure 2 on page 7 of the WebAssembly paper for another example.

That’s not too bad! If we make the simplifying assumption that our only primitive datatype is a 32-bit integer, then each instruction naturally corresponds to a simple function type. A load maps an address to a memory value, and a store maps an address and value to nothing (having a side effect).

There’s a deep observation embedded here: imperative programming becomes functional in the right light! Recall that one key aspect of functional programming is using explicit state objects, e.g. like in the KarelVM step function. If you look at, for example, x86 or C and reify memory as explicit state, then an instruction is a pure function $\mathsf{instr} : \mathsf{Args} \times \mathsf{Memory} \rightarrow \mathsf{Memory}$ .

In this light, if each instruction is a function, then type-checking a sequence of instructions is like type-checking a sequence of function compositions. Specifically, we introduce the T-Seq rule:

$\ir{T-Seq} {\typeJC{e_1^*}{\tfun{\tau_1^*}{\tau_2^*}} \s \typeJC{e_2^*}{\tfun{\tau_2^*}{\tau_3^*}}} {\typeJC{e_1^* ~ e_2^*}{\tfun{\tau_1^*}{\tau_3^*}}}$

The sequence rule says that if a sequence of instructions $e_1^*$ goes from some input types $\tau_1^*$ to some output types $\tau_2^*$ , then that composes with a second instruction sequence $e_2^*$ if it expects $\tau_2^*$ as input. Then the final expression type is the input of $e_1^*$ to the output of $e_2^*$ .

One subtle point: the typing rule is now being defined on sequences of instructions, i.e. $e^* : \tfun{\star{\tau_1}}{\star{\tau_2}}$ as oppposed to individual expressions like with the lambda calculus.

Let’s try to use thse rules to derive the type of 1 + 2.

$\ir{T-Seq} {\ir{?} {?} {\hasType{(\wconst{1})~(\wconst{2})}{\tfun{\eps}{\msf{i32}, \msf{i32}}}} \s \ir{T-Binop} {} {\hasType{\wbinop{\msf{add}}}{\tfun{\msf{i32},\msf{i32}}{\msf{i32}}}}} {\hasType{(\wconst{1})~(\wconst{2})~\wbinop{\msf{add}}}{\tfun{\eps}{\msf{i32}}}}$

Here we have a problem. According to T-Const, $\wconst{n} : \tfun{\eps}{\msf{i32}}$ . So we can’t trivially compose our rules together with T-Seq. Instead, we can introduce another rule that allows us to “weaken” the type of an instruction sequence, essentially saying: if an instruction operates on the top of the stack, it’s ok if there are values down the stack that remain untouched. Formally:

$\ir{T-Weaken} {\typeJC{\star{e}}{\tfun{\star{\tau_1}}{\star{\tau_2}}}} {\typeJC{\star{e}}{\tfun{\star{\tau'}, \star{\tau_1}}{\star{\tau'}, \star{\tau_2}}}}$

This rule says that any $\star{\tau'}$ beneath the stack values touched by $\star{e}$ remain untouched. With this rule, we can now finish our proof:

$\ir{T-Seq} {\ir{T-Const} {} {\hasType{(\wconst{1})}{\tfun{\eps}{\msf{i32}}}} \s \ir{T-Weaken} {\ir{T-Const} {} {\hasType{(\wconst{2})}{\tfun{\eps}{\msf{i32}}}}} {\hasType{(\wconst{2})}{\tfun{\msf{i32}}{\msf{i32}, \msf{i32}}}}} {\hasType{(\wconst{1})~(\wconst{2})}{\tfun{\eps}{\msf{i32}, \msf{i32}}}}$

Progress and preservation

As with the lambda calculus, we now face a tricky question: are our typing rules correct? Our solution is the same, to define progress and preservation theorems to prove the absence of undefined behavior. Now that our runtime and type system are more complicated, defining these theorems will also be more involved.

For example, consider naively translating the lambda calculus theorems into WASM. You would arrive at the theorems, for all configurations $C$ :

Progress (wrong): if $\typeJ{}{C.\wci}{\rho}$ , then either $\val{C}$ or there exists a configuration $C'$ such that $\steps{C}{C'}$ .
Preservation (wrong): if $\typeJ{}{C.\wci}{\rho}$ and $\steps{C}{C'}$ , then $\typeJ{}{C'.\wci}{\rho}$ .

To understand why this doesn’t work, consider the example program from above:

$e^* = (\wconst{1})~(\wconst{2})~\wbinop{\msf{add}}$

As before, we demonstrated that $e^* : \tfun{\eps}{\msf{i32}}$ . However, by D-Const, our step rule says:

$\steps {\{\wci{:}~(\wconst{1})~(\wconst{2})~\wbinop{\msf{add}}; \wcs{:}~\eps\}} {\{\wci{:}~(\wconst{2})~\wbinop{\msf{add}}; \wcs{:}~1\}}$

However, the program $(\wconst{2})~\wbinop{\msf{add}}$ does not have a type, violating preservation. The issue is that at runtime, the correctness of our intermediate state depends on the shape of the stack, which isn’t captured in our typing judgment. Instead, we need to introduce a new configuration typing judgment that can check whether a configuration, as opposed to a program, is well-typed.

$\boxed{\typeJC{C}{\tau^*}}$

$\ir{T-Config} {\typeJ{}{(\wconst{n_1}, \ldots, \wconst{n_{|n|}}, e^*)}{\tfun{\eps}{\star{\tau}}}} {\typeJ{}{\{\wci{:}~e^*; \wcs{:}~n^*\}}{\star{\tau}}}$

The judgment says, if we turned the runtime stack into a series of const instructions, then we can say the configuration is well-typed if the generated program is also well-typed. With this judgment, now we can define modified definitions of progress and preservation:

Progress (right): if $\typeJ{}{C}{\star{\tau}}$ then either $\val{C}$ or $\steps{C}{C'}$ .
Preservation (right): if $\typeJ{}{C}{\star{\tau}}$ and $\steps{C}{C'}$ then $\typeJ{}{C'}{\star{\tau}}$ .

Sections

WebAssembly

Stack machines

Example

Dynamic semantics

Addressable memory

Example

Dynamic semantics

Static semantics

Progress and preservation