Assignment 2: Lambda Practice
Due: Wednesday, October 10 at 4:00pm
Submission cutoff: Saturday, October 13 at 4:00pm
In this assignment, you will implement an interpreter for a variant of the simply typed lambda calculus or as presented in class. An interpreter is a program that implements the semantics of a programming language: it parses strings into syntax trees (or “a program”), typechecks the program, and then evaluates the program to a value. Through this assignment, you will become more familiar with the OCaml programming language as well as the process of writing interpreters for small functional languages.
Before you begin this assignment, we strongly recommend that you complete the OCaml lab to familiarize yourself with the logistics of programming in OCaml.
Setup
In the f18-assignments
repository, run git pull
to get the assignment materials, located in the assign2
directory.
Code overview
Take a moment to familiarize yourself with all of the starter code provided. The flow of the interpreter is:
- Raw source code enters as a string at the top-level
main.ml
. - The source string is tokenized (lexed) into tokens in
lexer.mll
. - The tokens are parsed in
parser.mly
into an abstract syntax tree (AST) defined inast.mli
. - The AST is typechecked in
typecheck.ml
. - The AST is interpreted to produce a final result in
interpreter.ml
.
Each set of .ml
and .mli
files represent a module, with the .mli
as the interface and .ml
as the implementation. For example, typecheck.mli
provides the interface to the typechecker, whereas main.ml
has no corresponding interface as no other functions call the main.
The basic structure of this assignment is that you will implement a complete interpreter for a “core” typed lambda calculus equivalent to the language discussed in the type systems lecture, i.e. integers and functions. Afterwards, you will then go back and add additional semantics for algebraic data types.
Parsing
We have implemented the parser for you. It maps string literals to syntax trees. In the grammar description below, we have provided both the concrete and abstract syntax for each part of the language.
For example, the string (concrete syntax):
(fn (x : int) . x * 7) 20
will map to the abstract syntax:
App(Lam("x", Int, Binop(Mul, Var("x"), Int(7))), Int(20))
Part 1: Substitution
For your first task, you will implement variable substitution on syntax trees. Specifically, you will implement the following function in ast.ml
:
val substitute : string -> Expr.t -> Expr.t -> Expr.t
This is approximate translation of substitute x e' e
. The substitute
function replaces every usage of the free variable x
in e
with e'
. As a reminder, here are a few examples of substitutions:
Part 2: Typechecker
The next step is to validate that a given expression is well-typed according to the statics semantics provided below.
$$ \ir{T-Var}{\hasType{x}{\tau} \in \Gamma }{\typeJ{\ctx}{x}{\tau}} \s \ir{T-Int}{ \ }{\hasType{n}{\tint}} \s \ir{T-Binop} {\typeJC{e_1}{\msf{int}} \s \typeJC{e_2}{\msf{int}}} {\typeJC{e_1 \oplus e_2}{\msf{int}}} \nl \ir{T-Lam} {\typeJ{\ctx,\hasType{x}{\tau_1}}{e}{\tau_2}} {\typeJC{(\funt{x}{\tau_1}{e})}{\tau_1 \rightarrow \tau_2}} \s \ir{T-App} { \typeJC{e_1}{\tau_1 \rightarrow \tau_2} \s \typeJC{e_2}{\tau_1} } {\typeJC{(\app{e_1}{e_2})}{\tau_2}} \nl $$Your goal is to implement the typecheck
function in typechecker.ml
. It adheres to the following signature:
val typecheck : Expr.t -> (Type.t, string) Result.t
The typecheck
function takes in an Expr.t
and returns a Result.t
, a sum type that is either Ok(the_type)
if the expression typechecks or Error(the_error)
if it does not.
Again, your implementation of typecheck
should mirror exactly the semantics described by the typing rules in the language specification. You will need to carry around a typing context, a mapping from variables to types. This can be concretely implemented using a String.Map.t
, documentation here.
Here’s a few examples:
typecheck(Expr.Int(3)) = Ok(Type.Int) (* 3 : int *)
typecheck(Expr.App(Expr.Int(3), Expr.Int(3))) = Error("something informative")
typecheck(Expr.Lam("x", Type.Int, Expr.Int(3))) = Ok(Type.Fn(Type.Int, Type.Int)) (* (fn (x : int) . 3) : int -> int *)
Part 3: Interpreter
Once you have verified that a expression is well-typed, the last step is to write an interpreter in interpreter.ml
that attempts to reduce the expression to a value. Your interpreter should follow the small-step operational semantics as given below.
Specifically, you will implement trystep
:
type outcome =
| Step of Expr.t
| Val
val trystep : Expr.t -> outcome
The trystep
function attempts to take a single step on a well-typed expression t
strictly following the dynamics above. If e
is value then the outcome is Val
. If e
successfully stepped, then the outcome is Step e'
where .
Here’s a few examples:
trystep(Expr.Int(3)) = Val (* 3 val *)
let lam = Expr.Lam("x", Type.Int, Expr.Int(3)) in
trystep(Expr.App(lam, Expr.Int(2))) = Step(Expr.Int(3)) (* (fn (x : int) . 3) 2 |-> 3 *)
trystep(Expr.Binop(Expr.Add, Expr.Binop(Expr.Add, Expr.Int(1), Expr.Int(3)), Expr.Int(7)))
= Step(Expr.Binop(Expr.Add, Expr.Int(4), Expr.Int(7)))
(* (1 + 3) + 7 |-> 4 + 7 *)
After this part, you should be able to pass all basic tests (see Testing for more on this).
Note: in these semantics, you can assume that a number divided by 0 is 0.
Part 4: Algebraic data types
Now, we’ll extend our specification to include sum and product types (see the lecture notes for a refresher). As above, implement the given syntax, statics, and dynamics in ast.ml
, typecheck.ml
, and interpreter.ml
.
Additional syntax
Additional statics
$$ \ir{T-Pair} {\typeJC{e_1}{\tau_1} \s \typeJC{e_2}{\tau_2}} {\typeJC{\pair{e_1}{e_2}}{\tprod{\tau_1}{\tau_2}}} \s \ir{T-Project-L} {\typeJC{e}{\tprod{\tau_1}{\tau_2}}} {\typeJC{\proj{e}{L}}{\tau_1}} \s \ir{T-Project-R} {\typeJC{e}{\tprod{\tau_1}{\tau_2}}} {\typeJC{\proj{e}{R}}{\tau_2}} \nl \ir{T-Inject-L} {\typeJC{e}{\tau_1}} {\typeJC{\inj{e}{L}{\tsum{\tau_1}{\tau_2}}}{\tau_1 + \tau_2}} \s \ir{T-Inject-L} {\typeJC{e}{\tau_2}} {\typeJC{\inj{e}{R}{\tsum{\tau_1}{\tau_2}}}{\tau_1 + \tau_2}} \nl \ir{T-Case} {\typeJC{e}{\tau_1 + \tau_2} \s \typeJ{\ctx,\hasType{x_1}{\tau_1}}{e_1}{\tau} \s \typeJ{\ctx,\hasType{x_2}{\tau_2}}{e_2}{\tau}} {\typeJC{\case{e}{x_1}{e_1}{x_2}{e_2}}{\tau}} $$Additional dynamics
$$ \ir{D-Pair$_1$}{\steps{e_1}{e_1'}}{\steps{\pair{e_1}{e_2}}{\pair{e_1'}{e_2}}} \s \ir{D-Pair$_2$}{\val{e_1} \s \steps{e_2}{e_2'}}{\steps{\pair{e_1}{e_2}}{\pair{e_1}{e_2'}}} \s \ir{D-Pair$_3$}{\val{e_1} \s \val{e_2}}{\val{\pair{e_1}{e_2}}} \nl \ir{D-Project$_1$}{\steps{e}{e'}}{\steps{\proj{e}{d}}{\proj{e'}{d}}} \s \ir{D-Project$_2$}{\val{\pair{e_1}{e_2}}}{\steps{\pair{e_1}{e_2}.L}{e_1}} \s \ir{D-Project$_3$}{\val{\pair{e_1}{e_2}}}{\steps{\pair{e_1}{e_2}.R}{e_2}} \nl \ir{D-Inject$_1$} {\steps{e}{e'}} {\steps{\inj{e}{d}{\tau}}{\inj{e'}{d}{\tau}}} \s \ir{D-Inject$_2$}{\val{e}}{\val{\inj{e}{d}{\tau}}} \nl \ir{D-Case$_1$} {\steps{e}{e'}} {\steps {\case{e}{x_1}{e_1}{x_2}{e_2}} {\case{e'}{x_1}{e_1}{x_2}{e_2}}} \nl \ir{D-Case$_2$} {\val{e}} {\steps {\case{(\inj{e}{L}{\tau})}{x_1}{e_1}{x_2}{e_2}} {\subst{x_1}{e}{e_1}}} \nl \ir{D-Case$_3$} {\val{e}} {\steps {\case{(\inj{e}{R}{\tau})}{x_1}{e_1}{x_2}{e_2}} {\subst{x_2}{e}{e_2}}} $$After this part, all tests (including advanced tests) should pass.
Part 5: Language extensions
This is the written portion of the assignment. Always eager to Move Fast and Break Things™, Will has submitted two new potential extensions to the language above. For each extension, he wrote down the proposed statics and dynamics. Unfortunately, only one of these extensions is type safe—the other violates at least one of progress and preservation. Your task is to identify which extension violates these theorems and provide a counterexample, then for the other extension provide a proof of both progress and preservation for the given rules.
First, recall from lecture the definitions of progress and preservation.
Progress: if , then either or there exists an such that .
Preservation: if and then .
The first extension adds let
statements to the language, just like in OCaml:
For example, this would allow us to write:
The second extension adds a recursor (rec
) to the language. A recursor is like a for loop (or more accurately a “fold”) that runs an expression from 0 to :
Here’s a few examples of using a recursor:
If you know LaTeX, we strongly recommend that you use LaTeX to write your proofs. We have provided you a file tex/solution.tex
in which you may write your solutions. You may, however, use a word processor or handwrite your solutions—we simply require that your submission be a PDF and legible.
Testing
To build the interpreter, run make
. This will create an executable, main.native
.
We’ve supplied a set of tests in the tests
directory. To test an individual file, you can use main.native
to invoke the interpreter, e.g.
./main.native tests/function.lam1
Binaries for Linux (Ubuntu) and Mac of our reference solution are provided at sol_linux
and sol_mac
and can be used similarly (i.e. sol_mac tests/function.lam1
).
For each of the functions you have to implement, we have provided a few additional unit tests inside
of an inline_tests
function in each file. Uncomment the line that says:
let () = inline_tests ()
And then execute ./main.native
(without arguments) to run those tests. If it does not fail with an assertion error, you have passed those tests.
Running run_tests.py
after building will run your main.native
executable over all files in tests
and compare them to our solution. To run just basic tests (up through Part 3), run
python3 run_tests.py
To run both basic and advanced tests (up through Part 4), run
python3 run_tests.py adt
Note that the tests are just a representative sample of the tests we will run your code on. We highly recommend writing your own tests as well.
Submission
To submit your work, we have two Gradescope assignments for the written and programming sections. For the written section, upload your assign2.pdf file. For the programming section, run ./make_submission.sh
and upload the newly created assign2.zip
file.