PDF version of this page.

Recall from last week that we had a big issue with our lambda calculus. We were able to construct programs with undefined behavior, i.e. ones that would evaluate to a stuck state, by having free variables in our expressions, like

Moreover, we had no way to easily enforce higher-level constraints about our functions. For example, let’s say we had a function that would apply an argument twice to a function.

We could accidentally give this a function that only takes one argument1, e.g.

Ideally, we could somehow restrict the allowable values for to the set of functions with two arguments (e.g. ).


The desired properties above are all examples of invariants, or program properties that should always hold true. Invariants are things like:

  • In the function , should be a number and .
  • In my ATM program, customers should not withdraw more money than they have in their account.
  • In my TCP implementation, neither party should exchange data until the initial handshake is complete.
  • In the driver for my mouse, the output coordinates for the mouse should always be within the bounds of my screen.

There are three main considerations in the design of invariants:

  1. Structure. What is the “language” of invariants? How can we write down a particular invariant?
  2. Inference. Which invariants can be inferred from the program, and which need to be provided by the programmer?
  3. Time of check. When, in the course of a program’s execution, is an invariant checked? Before the program is run?

For example, consider the humble assert statement. This is usually a built-in function that takes as input a boolean expression from the host language, and raises an error if the expression evaluates to false. In Python:

def div(m, n):
    assert(type(m) == int and type(n) == int)
    assert(n != 0)
    return m / n

For these kinds of asserts, the language of invariants is the same as the host language, i.e. a Python expression. This is quite powerful! You can do arbitrary computations while checking your invariants. These invariants are never inferred—you have to write the assert statements yourself2. And lastly, assert statements are checked at runtime, when the interpreter reaches the assert. Nothing guarantees that an assert is checked before code relying on its invariant is executed (e.g. accidentally dividing and then asserting in the case above).

By contrast, now consider the traditional notion of a “type system” as you know it from today’s popular programming languages. For most type systems, the language of invariants is quite restricted—types specify that a variable is a “kind” of thing, e.g. n is an int, but cannot specify further that n != 0. Most stone-age programming languages require the programmer to explicitly provide type annotations (e.g. int n = 0), but modern languages increasingly use type inference to deduce types automatically (e.g. let n = 0). Lastly, types can be checked either ahead of time (“statically”, e.g. C, Java) or during program execution (“dynamically”, e.g. Python, Javascript)3.

Key idea: type systems and runtime assertions derive from the same conceptual framework of enforcing invariants, just with different decisions on when and how to do the checks.

In this course, our focus is going to be on static analysis: what invariants can we describe, infer, and enforce before ever executing the program? And by enforcement, I mean iron law. We don’t want our type systems to waffle around with “well, you know, I bet this n is going to be an integer, but I’m only like, 75% sure.” We expect Robocop type systems that tell us: I HAVE PROVED TO 100% MATHEMATICAL CERTAINTY THAT IN THE INFINITE METAVERSE OF BOUNDLESS POSSIBILITIES, THIS “n” IS ALWAYS AN INTEGER.

This is the core impetus behind most modern research in PL theory. Advances in refinement types, dependent types, generalized algebraic data types, module systems, effect systems, traits, concurrency models, and theorem provers have pushed the boundaries of static program analysis. Today, we can prove more complex invariants than ever before. While cutting-edge PL research is mostly beyond the scope of this course, you will be equipped with the necessary fundamentals to continue exploring this space.

Typed lambda calculus

To understand the formal concept of a type system, we’re going to extend our lambda calculus from last week (henceforth the “untyped” lambda calculus) with a notion of types (the “simply typed” lambda calculus). Here’s the essentials of the language:

First, we introduce a language of types, indicated by the variable tau (). A type is either an integer, or a function from an input type to an output type . Then we extend our untyped lambda calculus with the same arithmetic language from the first lecture (numbers and binary operators)4. Usage of the language looks similar to before:

Indeed, our operational semantics are just the lambda calculus plus arithmetic. Zero change from before.

An interpreter for free

A brief aside: the main reason we’re using OCaml in this course (as opposed to, say, Haskell or Scala) is that feels quite similar to the typed lambda calculus. In fact, if we change a few keywords, we can use OCaml to execute exactly the language described above. (See the OCaml setup guide to follow along). If we wanted to transcribe the two examples above:

$ ocaml
# (fun (x : int) -> x + 1) 2 ;;
- : int = 3
# (fun (f : int -> int) -> fun (x : int) -> f (x + 1)) (fun y -> y * 2) 5 ;;
- : int = 12

Of course, OCaml can do much more than this—it has strings, exceptions, if statements, modules, and so on. We’ll get there, all in due time. I point this out to show you that by learning the lambda calculus, you are actually learning the principles of real programming languages, not just highfalutin theory. When you go to assignment 2 and start on your first OCaml program, the language will feel more familiar than you may expect!

Type system goals

Before we dive into the type system, it’s worth asking the motivational question: what invariants of our language do we want to statically check? One way to answer this is by thinking of edge cases we want to avoid.

  • Adding a number and a function:
  • Calling a function with the wrong type:
  • Incorrectly using a function argument:

This is an important exercise, since it gives us an intuition for where errors might arise. However, even if we had a method for completely eliminating the edge cases we thought of, how can we know we caught all the cases? What if we just didn’t think of a possible error?

Remember that all of these issues fundamentally boil down to stuck states, or undefined behavior. We specified our operational semantics over “well-defined” programs, but that doesn’t prevent us from writing invalid programs. As before, the goal is to take a program and step it to a value. This leads us to a safety goal: if a program is well-defined, it should never enter a stuck state after each step. If we can formally prove that this safety goal holds for our language, then that means there are no missing edge cases!

The goal of a type system, then, is to provide a definition of “well-defined” such that we can prove whether a given program is well-defined without executing it. Formally, we need a new judgment (binary relation) “ has type ”, written as . In the language above, it should be the case that and . To say an expression has a type is to say it is “well-defined” (or “well-typed”).

Lastly, we want to prove the “type safety” of our language with two theorems:

  1. Progress: if then either or there exists such that .
  2. Preservation: if and , then .

Intuitively, progress says: if an expression is well-typed and is not a value, then we should be able to step the expression (it is not in a stuck state). However, this isn’t enough to prove our safety goal, since we also need preservation: if an expression is well-typed, when it steps, its type is preserved. For example, if we have an expression of integer type, it shouldn’t turn into a function after being stepped.

Static semantics

In this conceptual framework of type systems, the first thing we need to do is define how we determine the type of an expression. In our grammar, we defined a type language, but now we need a type semantics (or “static semantics”). First, we’ll define the judgments for numbers:

As you can see, these are defined quite similarly to how we defined our operational semantics (or “dynamic semantics”). Each rule defines a different way to determine whether a particular expression has a particular type. Just like , the rule of is axiomatic—a numeric constant has type under all conditions. says: if the two subexpressions are both integers, then the binary operation on those subexpressions is also an integer. From these two rules, we can construct a proof that :

Ok, but what is this “” business? Or “”? It’s the same core idea as what you did for dynamic scoping on Assignment 1. To typecheck expressions with variables, we need to introduce a “typing context” that maps variables to their types. Intuitively, when typechecking , we want to remember that any usage of in should assume . Formally, we write this as:

represents our type context. You could also think about it as a “proof context”, since our type-checker is basically a theorem prover that’s formally verifying the type of an expression.) It is a mapping from variables to types, and it works exactly like the context used in part 3 of Assignment 1. As a reminder, we can add mappings to our context5, indicated by . The notation means that “given the proof context , it is provable that .”

Let’s read through the rules. says that if our context says then has type . is the most complex: it says that to type-check a function, we want to type-check the body of the function assuming that , where is the type provided in the program syntax. Then, assuming our body typechecks to another type , this becomes the return type of the function, so its entire type is .

Note a subtlety here: is given to us from the program, while we have to compute . Our typed lambda calculus mixes types that are explicitly annotated and implicitly inferred.

Lastly, the rule says: when calling a function, the function expression should be a function , and the argument expression should be of the appropriate argument type . Then, the result of applying the function is the result of the function, type .

As an example of these rules, here is the proof that .


At this point, you should understand the mechanics of our type system: how we define our typing rules, and how they can be used to construct proofs about the types of expressions. But it’s not sufficient just to have a type system, we need a good type system! Remember, we want to demonstrate that if a program is well-typed, then it will never enter a stuck state. To do that, we have to prove the progress and preservation theorems, i.e. verify that our language is actually “type safe.” This is an example of a metatheoretical property of our programming language. We use the type system to prove that expressions has certain types, and on the next level up we prove that our type system (or proof system) has certain properties.

Structural induction

Before we actually do the proof, we need to talk about how to do proofs about programming languages. From your discrete mathematics course, you’re probably familiar with “mathematical” induction (see CS 103 Mathematical Induction for a refresher) where induction always occurs on the natural numbers. If you want to prove a proposition for all , then an inductive proof will show and .

For proofs with programming languages, we generalize the idea of mathematical induction to structural induction. Until this point, we’ve done proofs about individual programs, e.g. above showing that a particular concrete expression has a particular type. As you’ve seen, these have an inductive flavor—to prove a statement, you recursively prove statements about its sub-components until you reach a base case.

While the proofs we’ve done have been about concrete expressions, the next step is to generalize our proofs to work on arbitrary expressions. For example, if we want to prove a proposition on all well-typed expressions, then we have to prove the proposition holds for all the ways a type can be constructed for an expression. This is fairly abstract, so let’s dive into the progress/preservation proofs to get an example of what this looks like.

Proving type safety

Recall the preservation theorem: if and then . To prove this for our simply typed lambda calculus, we are going to proceed by structural induction over the different ways a type can be constructed, i.e. each typing rule in our static semantics (also called “rule induction”). The typing rule will tell us a more specific version of the proposition to prove, and also provide us with certain facts from our inductive hypothesis.

Note: the progress and preservation theorems are defined with respect to “closed terms”, i.e. expressions which don’t need a type context at the top level to prove their type. Or put another way, expressions with no free variables.

Proof. By rule induction on the static semantics.

  1. T-Var: if and then .

    This is vacuously true, since a variable cannot have a type without a typing context .

  2. T-Int: if and then .

    This is vacuously true, since there is no rule to step a number .

  3. T-Binop: if and then .

    First, by the premises of the rule, we know and .

    Second, by the inductive hypothesis (IH), we get to assume preservation holds true for and . For example, if then we know (and likewise for ).

    Third, we case on the three ways in which a binary operation can step:

    1. D-Binop-1: assume , so . By the IH, , so by we have that .

    2. D-Binop-2: assume and , so . By the IH, , so by we have that .

    3. D-Binop-3: assume and and , so . By we have that .

    Hence, in every case, we have shown that for all possible , and the preservation theorem holds for .

  4. T-Lam: if and then .

    This is vacuously true, since there is no rule to step a function value.

  5. T-App: if and then .

    First, by the premises of , we know and .

    Second, by the IH, we know that preservation holds for and .

    Third, we case on the two ways an application can step:

    1. D-App-1: assume , so . By the IH, , so by , we know .

    2. D-App-2: assume , so .

      By the inversion6 of ($\text{T-Lam}$), we know and .

      By the substitution typing lemma7, .

    Hence, preservation holds in either case.

Since preservation holds for all typing rules, then it holds for the entire language.

Lastly, let’s prove progress.

Theorem. if then either or there exists such that .

Proof. By rule induction on the static semantics.

  1. T-Var: if then either or there exists such that .

    This is vacuously true, since a variable cannot have a type without a typing context .

  2. T-Int: if then either or there exists such that .

    By D-Int, .

  3. T-Binop: if then either or there exists such that .

    First, by the premises of the rule, we know and .

    Second, by the inductive hypothesis (IH), we get to assume progress holds true for and . For example, if either or .

    Third, we case on the different possible states of and derived from the IH:

    1. : then by D-Binop-1, .

    2. : then by D-Binop-2, .

    3. : because and , then by inversion on D-Int we know and . Therefore by D-Binop-3, for .

    In each case, the expression steps, so progress holds.

  4. T-Lam: if then either or there exists such that .

    By D-Lam, .

  5. T-App: if then either or there exists such that .

    First, by the premises of , we know and .

    Second, by the IH, we know that progress holds for and .

    Third, we case on the different possible states of derived from the IH:

    1. : then by D-App-1,

    2. : then by inversion on D-Lam, . By D-App-2, .

    In each case, the expression steps, so progress holds.

Since progress holds for all typing rules, then it holds for the entire language.

  1. In the lambda calculus, all functions technically take one argument, so when I say “a function that takes one argument”, I mean as opposed to a function that returns another function. 

  2. Except where built into the language, of course. In the div example, both of the asserted invariants (int types and nonzero) will be checked by the division operator in the language runtime. 

  3. The options provided do not strictly form a dichotomy. “Gradual” or “hybrid” invariant enforcement that mixes static/dynamic checks is an active area of research, e.g. gradual typing

  4. Why is the arithmetic necessary? Can’t we just keep our functions-only approach? Unfortunately, no. Imagine the function and I asked you: what is the type of this function? At some point, you have to have a “base type”, since a type language of just is infinitely recursive, and you cannot construct an actual type. 

  5. You can think about as a “purely functional” dictionary. Adding a new mapping like will overwrite a previous mapping for

  6. Inversion is a useful proof technique/lemma to cite on occasion. Generally speaking, it falls from the rule: if , and , then , i.e. . Above, since there’s only one way to construct a type for a lambda, if we have a lambda and know it has a type, then we can deduce that its body must be well typed. 

  7. You can assume if and then