$$ \newcommand{\hadd}{~\hat{+}~} \newcommand{\hmin}{~\hat{-}~} \newcommand{\hmul}{~\hat{*}~} \newcommand{\hdiv}{~\hat{/}~} \newcommand{\hoplus}{~\hat{\oplus}~} $$

Due: Wednesday, October 23, 2019 at 11:59pm
Submission cutoff: Saturday, October 26, 2019 at 11:59pm

If there’s one thing that a functional programming language is really good for, it’s writing more functional programming languages. It turns out that inductive data structures (syntax) and logic rules (semantics) have a fairly natural translation into a functional style. For example, think back to our arithmetic language from lecture 1.2:

We can translate the grammar into a series of recursive sum types:

type number = int

type binop = Plus | Minus | Times | Divide

type expr = Number of number | Binop of expr * binop * expr

(* "1 + 2 * 3" as an expression *)
let e : expr = Binop(Number 1, Plus, Binop(Number 2, Times, Number 3))

Values of the type expr correspond to the abstract syntax tree of a program. The operational semantics then become a function that takes an expression, and output whether it steps or is a value.

type outcome = Step of expr | Val

let rec trystep (e : expr) : outcome =
  match e with
  | Number _ -> Val
  | Binop (el, binop, er) ->
    (match trystep el with
     | Step el' -> Step (Binop (el', binop, er))
     | Val ->
       (match trystep er with
        | Step er' -> Step (Binop (el, binop, er'))
        | Val ->
          (* This non-exhaustive match is provably safe by inversion on D-Num *)
          let (Number nl, Number nr) = (el, er) in
          let op = match binop with
            | Plus -> ( + )
            | Minus -> ( - )
            | Times -> ( * )
            | Divide -> ( / )
          Step (Number (op nl nr))))

assert (trystep (Number 3) = Val);
assert (trystep (Binop (Number 1, Plus, Binop (Number 2, Times, Number 3)))
  = Step (Binop (Number 1, Plus, Number 6)))

This example hopefully clarifies the distinction between and : the “plus” and “minus” symbols in the abstract syntax tree are distinct from the standard plus and minus operations in a theory of arithemtic, e.g. as embedded in the underlying OCaml semantics. Hence we need an explicit translation that maps Plus -> (+).

The sum of the syntax tree and the operational semantics defines an interpreter for the arithmetic language, or a program that can compute arithmetic programs. In building an interpreter, the learning goal is to deepen/check your understanding of PL theory by mechanizing it into OCaml. That is, we will turn operations like substitution and type-checking which we previously wrote by hand/Latex into executable algorithms.

In this assignment, you will implement an interpreter for the typed lambda calculus including every feature discussed in class so far, such as algebraic data types, recursive types, polymorphic types, and existential types.

Interpreter overview

In assign4/program, we have provided you a skeleton of an interpreter. At a high level, the interpeter follows this pseudocode:

let filepath = "test.lam" in (* for example *)
let input : string = read_input filepath in
let e : Ast.Expr.t = Parser.parse input in
let tau : Ast.Type.t = Typecheck.typecheck e in
let e' : Ast.Expr.t = Interpreter.eval e in
Printf.printf "%s : %s |-> %s"
  (Eval.to_string e) (Type.to_string tau) (Eval.to_string e')

Following the steps, in order:

  • A lexer and parser, defined in lexer.mll and grammar.mly, transform an input string into a syntax tree, i.e. an expression of type Expr.t. The syntax tree for expressions and types is defined in ast.mli.
  • A typechecker, defined in typecheck.ml, computes the type of the expression e.
  • An interpreter, defined in interpret.ml, evaluates the expression e to a value.

We have already provided you a complete lexer and parser. At a high-level, your goal is to implement the remaining parts of the interpreter: substitution, alpha-equivalence, type-checking, and evaluation. Each component is explained in detail below, but I want to emphasize a methodology: do not build each interpreter component for the entire language in isolation.

Instead, you should build the interpreter in a test-driven, language-feature-major fashion. For example, let’s say you want to start with the arithmetic subset of the typed lambda calculus.

  1. First, you should write a few test cases of programs that only use arithmetic, like 1, 1 + 2, (1 - 3) * 4, and so on.
  2. Implement substitution, alpha-equivalence, and type-checking for numbers and binary operators. Make sure that each of your examples type-checks if it should (or vice versa if it should not).
  3. Implement stepping for numbers and binary operators. Make sure that your examples step to the correct value and don’t enter a stuck state.

Once you’re satisfied that arithmetic works properly, then rinse and repeat for the remaining features. We recommend proceeding in this order:

  1. Arithmetic: Numbers, binary operators (already completed for you, just review the starter)
  2. Conditionals: Booleans, relational operators, AND/OR, if expressions
  3. Functions: lambdas, application, variables
  4. Product types: unit, pair, project
  5. Sum types: inject, case
  6. Fixpoints: fix
  7. Polymorphism: type functions and type application
  8. Recursive types: folding and unfolding
  9. Existential types: imports and exports

If you do not follow this methodology and then come to office hours asking generic questions like “why doesn’t my code work?”, we will likely tell you to simplify your development process.


In order to write test cases, you will need to know the concrete syntax of the typed lambda calculus language. For example, if you have a file test.lam:

let n : num = 1 + 2 in
let t : num * num = (n, n + 1) in
t.L + t.R

You can build your interpreter by typing make then running ./main.native test.lam -v to get:

Expr: ((fun (n : num) -> ((fun (t : num * num) -> t.L + t.R) (n, n + 1))) 1 + 2)
Type: num
Success: 7

We have also provided you a reference interpreter reference.sh that you can run the same way as main.native. If you are ever unsure about the semantics of a program, it should exactly match the implementation of the reference. If you think there’s a bug in the reference, please let us know!

Concrete syntax examples

Here are a few brief examples of the lambda calculus syntax. (Coincidentally, this entire program is a good holistic test case!)

(* Arithmetic, booleans, lets *)
let n : num = 1 + 2 in
let n : num = if (n > 1) || (n = 1) then n - 1 else 0 in

(* Functions and pairs *)
let f : num -> num = fun (n : num) -> n + 1 in
let t : num * bool = (1, true) in
let n : num = (f (t.L)) in

(* Sums *)
let s : num + bool = inj false = R as num + bool in
let n : num = case s { L(n) -> n + 1 | R(b) -> if b then 1 else 0 } in

(* Fixpoint *)
letrec fact : num -> num = fun (n : num) ->
  if n = 0 then 1 else n * (fact (n - 1))
let n : num = (fact 5) in

(* Polymorphic identity *)
let id : forall a . a -> a = tyfun a -> fun (x : a) -> x in
let n : num = ((id [num]) 1) in


See the assign4/program/examples directory for more complicated programs, particularly involving recursive and existential types. I’ve transcribed several of the examples from lecture into their corresponding text form.

A few things to note on writing lambda calculus programs:

  • Our parser is not battle-tested, so it’s possible that the associativity or precedence may be counter-intuitive. Always check the program parse for correctness. If something is off, you should be able to add parentheses until it works.
  • Because you are not implementing type inference and our language does not support type aliases, you will have a lot of type annotations everywhere. This is sadly unavoidable.
  • The let and letrec constructs are not in the AST because they are de-sugared into function application and fixpoint as defined in lecture 3.1.

Test harness

Once you’ve written an example program, if you place it in the tests directory, then you can run the script ./run_tests.py to run your interpreter on each test and compare the result to a provided reference interpreter. Make sure your tests use the extension .lam. By default, the test harness will output to your console, but if you wish to have it output to a file, you can use the --outfile option. There are other options too: for example, if you wish to run on only a subset of the tests, you can make a new directory containing only the tests you wish to run and change the --test_dir option. You can see a full listing of options through ./run_tests.py --help.

Note that the test harness requires to_debruijn to be implemented correctly under ast_util.ml to function correctly once you start supporting variables. See the “Alpha-equivalence” section for details.


To debug to your interpreter, the binary has a -v and -vv flag for verbose and extra verbose, respectively. The first flag will print out the parse and type of the input program, and the second flag will also print out a trace of the expression after each step.

If you want to add your own debugging information, you can use the verbose : unit -> bool and extra_verbose : unit -> bool functions to check if the flags are set.

Abstract syntax

The language’s abstract syntax is defined in ast.mli. Briefly, a .mli file (as opposed to .ml) defines the interface for a module. So a reference in another file to Ast.something can only reference something if it exists in the .mli interface.

For example, the Ast.Expr module defines a type:

type binop = Add | Sub | Mul | Div

type t =
  | Num of int
  | Binop of {binop : binop; left : t; right : t}

For clarity, we use records instead of tuples for syntactic forms with multiple fields. For example, we can create and use expressions like:

let e : Expr.t = Binop {binop = Add; left = Num 1; right = Num 2} in
(* Mirror binary operators *)
match e with
| Num n -> Num n
| Binop {binop; left; right} ->
  Binop {binop; left = right; right = left}

The algebraic data type mirrors the grammar notation we have used throughout the class. Below is our language’s complete grammar:


The first primary feature of the interpreter is the type-checker, which implements our typed lambda calculus’ static semantics. In typecheck.ml, you will implement the function:

val typecheck_expr : Type.t String.Map.t -> Expr.t -> (Type.t, string) Result.t

typecheck_expr ctx e takes a typing context and expression as input, returning a Ok(tau) if the expression is is well-typed, and returning Err(error_string) if it is not. The context ctx corresponds to the term in the type rules, mapping variables (strings) to types.

Across the assignment, you will not be graded on the usefulness of your error messages, but we recommend good error messages as a debugging aid. We have provided a Type.to_string and Expr.to_string that convert types and expressions to strings.


We have already provided you with the typechecking rules for the number and binary operator cases. Reproduced below:

match e with
| Expr.Num _ -> Ok Type.Num
| Expr.Binop {left; right; _} ->
  typecheck_expr ctx left >>= fun tau_left ->
  typecheck_expr ctx right >>= fun tau_right ->
  (match (tau_left, tau_right) with
   | (Type.Num, Type.Num) -> Ok Type.Num
   | _ -> Error ("informative error message"))

These rules correspond to the static semantic rules:

Carefully study the correspondence between the rules. Inductive conditions, for example, correspond to a recursive call to the typechecker. Note the use of the infix monadic bind operator, written as >>=. Without it, the implementation looks like:

| Expr.Binop {left; right; _} ->
  (match typecheck_expr ctx left with
   | Err s -> Err s
   | Ok tau_left ->
     (match typecheck_expr ctx right with
      | Err s -> Err s
      | Ok tau_right ->
        (match (tau_left, tau_right) with
         | (Type.Num, Type.Num) -> Ok Type.Num
         | _ -> Error ("informative error message"))))

Notice the core structure of the logic. First, we typecheck the left side, and if the typecheck fails, then the overall check fails. Same thing for the right side, then finally check that the types are both numbers. This is a common pattern in code with sequences of operations, each of which can possibly fail.

The bind operator res >>= f says: “if the result res on left is Ok(tau), then return (f tau), else return the error without calling f”. This allows us to linearize the code and avoid extra matches. We encourage you to adopt this pattern to avoid unnecessary boilerplate in your code.


A few more notes on implementing the typechecker:

  • You should not check for type equality using the = operator. You should use Ast_util.Type.aequiv as detailed in the section on alpha-equivalence below.
  • We have written a few simple tests at the bottom in inline_tests. You can uncomment let () = inline_tests () to run them whenever you run the interpreter binary. You will find a similar pattern in each module. Feel free to write tests here, or use the external run_tests.py harness.
  • The reference typecheck_expr implementation is about 200 lines long (with good error messages!).


Below we reproduce the complete static semantics:


If an expression is well-typed, then it will be evaluated to a value using the eval function in interpreter.ml. eval works by repeatedly computing trystep on the current expression:

type outcome =
  | Step of Expr.t
  | Val

val trystep : Expr.t -> outcome


We have provided you all the Val rules and the Binop step rule for trystep. For example:

match e with
| Expr.Binop {binop; left; right} ->
  (left, fun left' -> Expr.Binop {left = left'; binop; right;}) |-> fun () ->
  (right, fun right' -> Expr.Binop {right = right'; binop; left}) |-> fun () ->
  let (Expr.Num n1, Expr.Num n2) = (left, right) in
  let f = match binop with
    | Expr.Add -> (+)
    | Expr.Sub -> (-)
    | Expr.Mul -> ( * )
    | Expr.Div -> (/)
  Step (Expr.Num (f n1 n2))

This expression says: first, step left to a value. Then step right to a value. Once they are both values, we can extract the inner numbers n1 and n2, then combine then with the appropriate binary operation. Note that we can perform the non-exhaustive match on Expr.Num because e is well-typed, so we know that and by inversion, if then it must be a number.

The starter code uses a funky custom operator |-> which is defined at the bottom of trystep:

let rec trystep (e : Expr.t) : outcome =
and (|->) ((e, hole) : Expr.t * (Expr.t -> Expr.t)) (next : unit -> outcome)
  : outcome =
  match (trystep e) with Step e' -> Step (hole e') | Val -> next ()

This operator captures the idea of reducing subexpressions to a value before proceeding. Specifically, if an expression e steps to e', then it inserts e' into a hole hole e', e.g. above:

fun left' -> Expr.Binop {left = left'; binop; right}

Then if e is a value, then operator calls next. This contrasts with the longer and clunkier version:

| Expr.Binop {binop; left; right} ->
  (match trystep left with
   | Step left' -> Step (Expr.Binop {left = left'; binop; right})
   | Val ->
     (match trystep right with
      | Step right' -> Step (Expr.Binop {right = right'; binop; left})
      | Val ->
        let (Expr.Num n1, Expr.Num n2) = (left, right) in
        let f = match binop with
          | Expr.Add -> (+)
          | Expr.Sub -> (-)
          | Expr.Mul -> ( * )
          | Expr.Div -> (/)
        Step (Expr.Num (f n1 n2) )

You do not have to use the |->, it’s solely for your convenience.


A few notes on trystep:

  • It’s totally fine if you have non-exhaustive matches in your interpreter. Again, you can assume your expression is well-typed, meaning you can assume what kind of expression you’ll have at different points once it’s a value.

  • The reference solution of trystep is 90 lines of code. (Much less than the typechecker!)


Below is the complete dynamic semantics for our language. It’s a lot of rules… but many of them are trivial values, or simple reduction orders. Hence the actual implementation is much shorter than for the typechecker.


Once you are implementing typechecking and evaluation for functions/variables, you will need to have a substitution operation . In ast_util.ml, you must implement substitution for both types and expressions, of the form:

val substitute : string -> t -> t -> t

Here, t is either Expr.t or Type.t, and substitute x e' e maps to the form (in the case of expressions).

Recall that the main idea of substitution is replacing all free variables x with the expression e' in e. The challenge in implementing substitution is that it must respect two properties:

  1. Variable shadowing: in lexical scoping, a variable is bound to its innermost identifier. For example, because the inner is bound to the lambda.
  2. Alpha-equivalence: variables should not be substituted in a way that would accidentally bind an otherwise free variable. For example, where the inner argument is renamed to avoid a conflict.

Capturing both of these properties is actually quite non-trivial. One possibility is to use an abstract binding tree data structure that encodes lexically scoped variable semantics into the tree itself. Another is to use the semantics of the base language (OCaml) to encode the properties, i.e. higher-order abstract syntax.

We recommend this simple, slightly inelegant strategy:

  1. When substituting into a expression (or type), maintain a mapping from variables to expressions (or types). When performing , the mapping begins with .
  2. As you recurse through the expression, you may encounter a variable binding e.g. . Generate a “fresh” variable (one not used in the program) and add to the mapping. (We have provided you a such a fresh function.)
  3. When you reach a variable , if is in the mapping, then replace with the corresponding value.

This strategy will eagerly rename every variable after each substitution, but it will ensure the two properties above. To get you started, we have provided a helper function substitute_map that takes such a map as input.


When type-checking, you will inevitably need to compute the equivalence of types. For example, checking function application will require comparing the argument’s type against the function’s argument type. However, consider the case where the types contain type variables, e.g. and . We would like to say that these are the same types, because values of these types can be used interchangeably—the choice of type variable name here has no semantic impact.

This idea is called alpha-equivalence, or being equal up to a renaming of bound variables. For example:

You need to implement alpha-equivalence for expressions and types in ast_util.ml. The simplest way to implement alpha-equivalence is to convert a given expression or type into a nameless form, i.e. using de Bruijn indices. The basic idea is to replace each variable name with a number representing the “depth” of its binding. For example, in the program:

let x : num = 1 in
let y : num + bool = inj 5 = L in num + bool in
case y {
  L(z) -> z + x
| R(z) -> 9

Would translate into the nameless form:

let _ : num = 1 in
let _ : num + bool = inj 5 = L in num + bool in
case <0> {
  L(_) -> <0> + <2>
| R(_) -> 9

When a variable is used, you count the number of bindings for any variable between the usage and the corresponding binding. For example, the distance from case y to let y is 0, while the distance from x to let x is 2 because of the z and y bindings.

We have already implement alpha equivalence for you in the aequiv function. Your task is to implement the translation for expressions and types into de Bruijn indices in the to_debruijn function:

val to_debruijn : t -> t

Some notes:

  • To replace a variable with its de Bruijn index, you can convert the index to a string, e.g. Var(Int.to_string depth).
  • If an expression has free variables, you can leave them unchanged (i.e. do not replace with a number).
  • All variable names should be replaced with the underscore "_". In Expression.to_debruijn, you should replace all types with Ast.Type.Var "_".
  • The local test harness relies on your to_debruijn function to output expressions that it can compare with the reference solution in an alpha-equivalent way. That means if to_debruijn function is incorrect, the test harness will not work as expected.

Submission and grading

For this assignment, our grading policy is that when you upload your assignment to Gradescope, you will receive your actual final grade for the submission. There are no private tests, but the Gradescope tests are still hidden, meaning you will only see coarse feedback (e.g. 8/10 correct for sum types). You should write tests both to check your code and also to understand the pragmatics of the typed lambda calculus in practice.

The full Gradescope test suite is under construction, but it will be completed by Friday evening.