Assignment 4: Type Extensions
Due: Wednesday, October 25, 2017 at 4:20pm
Submission deadline: Saturday, October 28, 2017 at 4:20pm
Now that you’ve got some interpeter experience under your belt, it’s time to take your language to the next level. In this assignment, you will implement an interpreter for the exciting simply typed lambda calculus, or (a name of my own invention). Specifically you will extend our previous language with algebraic data types, pattern matching and polymorphic/existential types.
Setup
Setup is same as always—copy the assignment folder out of our folder on Rice.
On the Rice machines:
cp -r /afs/ir/class/cs242/assignments/assign4 assign4
On your local machine:
scp -r <SUNetID>@rice.stanford.edu:/afs/ir/class/cs242/assignments/assign4 assign4
Requirements
You must implement the translation, typechecking, and interpretation stages of the interpreter. You wll turn in these files:
translator.ml
typechecker.ml
interpreter.ml
There are no written portions to this assignment.
Submitting
Then upload the files in your copy of assign4
to Rice. To submit, navigate to the assignment root and execute:
python /afs/ir/class/cs242/bin/submit.py 4
Note: We are grading tests that do not result in an error using exact text matching, and will not fix submissions that have extra print statements or the like. It is your responsibility to test before submitting
Language specification
Adding algebraic datatypes, pattern matching, and first-order type logic to our language introduces a substantial level of complexity in both the language and its implementation. A common technique for mitigating complexity in language implementation is to use intermediate representations (IRs), or mini-languages internal to the interpreter. For our language , we have the top-level language defined by what the programmer will write as well an IR for our typechecker and interpreter.
First, we will examine the grammar for our new language. Specifically, we have added new types: and the corresponding terms necessary to implement those types. These are precisely the same formulations as we discussed in class. Additionally, this language introduces pattern matching, a restricted version of the match
clause you’ve used in OCaml.
As a refresher from lecture, “injects” the term as the left or right side of the sum type , depending on what is. and are similar to and except that they operate on types, and not terms. That means maps a variable name to a type and not a term . expects a type as an argument for some .
, , and are intimately related. is always used with the type . states that there exists a type that looks like the type . This essentially allows us to define abstract interfaces. , “packs” a concrete implementation of some type into some existential type . allows us to retrieve the concrete implementation.
As mentioned, all of the rules for algebraic data types and first order type logic mirror the semantics discussed in class, so refer back to those notes for more details. The pattern matching mechanism is novel, however, as this is the first time we will give it a formal treatment. Here’s a few examples using patterns in this language:
If we wanted to, we could stop here and define a full statics and semantics for the language above. However, we need not to, since we can translate some parts of the language into other. Specifically, we will eliminate pattern matching in a translation pass. For example, we could define the following translation of a tuple match (where means “translates to”):
To formalize this translation, we will first define a grammar for a new IR:
$$ \begin{alignat*}{4} \tsf{Type}~\tau \Coloneqq \qamp \ttt{Var}(X) \qqamp X \qqamp \text{variable} \\ \qamp \ttt{Int} \qqamp \ttt{int} \qqamp \text{integer}\\ \qamp \ttt{Fn}(\tau_1, \tau_2) \qqamp \tau_1 \rightarrow \tau_2 \qqamp \text{function} \\ \qamp \ttt{Product}(\tau_1, \tau_2) \qqamp \tau_1 * \tau_2 \qqamp \text{product} \\ \qamp \ttt{Sum}(\tau_1, \tau_2) \qqamp \tau_1 + \tau_2 \qqamp \text{sum} \\ \qamp \ttt{ForAll}(X, \tau) \qqamp \forall X.\tau \qqamp \text{forall} \\ \qamp \ttt{Exists}(X, \tau) \qqamp \exists X.\tau \qqamp \text{exists} \\ \\ \tsf{Term}~t \Coloneqq \qamp \ttt{Var}(x) \qqamp x \qqamp \text{variable} \\ \qamp \ttt{Int}(n) \qqamp n \qqamp \text{integer} \\ \qamp \ttt{Binop}(\oplus, t_1, t_2) \qqamp t_1 \oplus t_2 \qqamp \text{binary operation} \\ \qamp \ttt{Lam}(x, \tau, t) \qqamp \ttt{fn} \ (x : \tau) \ . \ t \qqamp \text{function} \\ \qamp \ttt{App}(t_1, t_2) \qqamp t_1 \ t_2 \qqamp \text{application} \\ \qamp \ttt{Pair}(t_1, t_2) \qqamp (t_1, t_2) \qqamp \text{tuple} \\ \qamp \ttt{Project}(t, d) \qqamp t.d \qqamp \text{projection} \\ \qamp \ttt{Inject}(t, d, \tau) \qqamp \ttt{inj} \ t = d \ \ttt{as} \ \tau \qqamp \text{injection} \\ \qamp \ttt{Case}(t, (x_1, t_1), (x_2, t_2)) \qqamp \ttt{case} \ t \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \} \qqamp \text{case} \\ \qamp \ttt{TLam}(X, t) \qqamp \ttt{tfn} \ X \ . \ t \qqamp \text{type function} \\ \qamp \ttt{TApp}(t, \tau) \qqamp t \ [\tau] \qqamp \text{type application} \\ \qamp \ttt{TPack}(\tau_1, t, \tau_2) \qqamp \{\tau_1, t\} \ \ttt{as} \ \tau_2 \qqamp \text{type abstraction} \\ \qamp \ttt{TUnpack}(X, x, t_1, t_2) \qqamp \ttt{unpack} \ \{X, x\} = t_1 \ \ttt{in} \ t_2 \qqamp \text{type open} \\ \\ \tsf{Binop}~\oplus ::= \qamp \ttt{Add} \qqamp + \qqamp \text{addition} \\ \qamp \ttt{Sub} \qqamp - \qqamp \text{subtraction} \\ \qamp \ttt{Mul} \qqamp * \qqamp \text{multiplication} \\ \qamp \ttt{Div} \qqamp / \qqamp \text{division} \\ \\ \tsf{Direction}~d ::= \qamp \ttt{Left} \qqamp L \\ \qamp \ttt{Right} \qqamp R \end{alignat*} $$This IR has three differences from the top-level:
- Patterns are removed entirely.
TUnpack
has moved fromPattern
toTerm
.Match
has been replaced byCase
.
In terms of the code, the first grammar corresponds to the Lang
module in ast.ml
, whereas the second corresponds to the IR
module.
Part 1: Translator
Your first job is to implement the translation function in translator.ml
:
val translate : Lang.Term.t -> IR.Term.t
This function takes a term of the top-level language Lang.Term.t
and converts it into an IR.Term.t
. It cannot fail, so we do not return a Result.t
.
The translation is largely trivial for most of the language as the only major difference is the Lang.Term.Let
and Lang.Term.Match
cases. We have implemented all of the trivial cases as well the Match
case for you.
We can formalize the Let
and Match
translations by defining a series of translation rules:
Look at Match
for an example of implementing one of these translation rules. After implementing these rules, you will have a complete pipeline to move from source code to your IR.
Note: if we wanted to formally verify our translation rules, we could define a dynamics for both the toplevel
Lang
and theIR
, and then prove that for all that if inLang
and then implies that .
Part 2: Typechecker
Your next task is to implement a typechecker for the IR. Because we decided to translate before typechecking, this will reduce the complexity of the typechecker (although this will obscure type errors–it’s a tricky business to match a type error on generated code back to the original source).
The format of the typechecker is the mostly same as the previous assignment, but now the type system is more complex. The statics are:
$$ \ir{T-var}{ \ }{\typeJ{\ctx,\hasType{x}{\tau}}{x}{\tau}} \s \ir{T-n}{ \ }{\hasType{\num{n}}{\ttt{int}}} \s \ir{T-fn} {\typeJ{\ctx,\hasType{x}{\tau_1}}{t}{\tau_2}} {\typeJC{(\ttt{fn} \ (x : \tau_1) \ . \ t)}{\tau_1 \rightarrow \tau_2}} \nl \ir{T-app} {\typeJC{t_1}{\tau_1 \rightarrow \tau_2} \s \typeJC{t_2}{\tau_1}} {\typeJC{(t_1 \ t_2)}{\tau_2}} \s \ir{T-tuple} {\typeJC{t_1}{\tau_1} \s \typeJC{t_2}{\tau_2}} {\typeJC{(t_1, t_2)}{\tau_1 * \tau_2}} \nl \ir{T-project-L} {\typeJC{t}{\tau_1 \times \tau_2}} {\typeJC{t.L}{\tau_1}} \s \ir{T-project-R} {\typeJC{t}{\tau_1 \times \tau_2}} {\typeJC{t.R}{\tau_2}} \nl \ir{T-inject-L} {\typeJC{t}{\tau_1}} {\typeJC{\ttt{inj} \ t = L \ \ttt{as} \ \tau_1 + \tau_2}{\tau_1 + \tau_2}} \s \ir{T-inject-R} {\typeJC{t}{\tau_2}} {\typeJC{\ttt{inj} \ t = R \ \ttt{as} \ \tau_1 + \tau_2}{\tau_1 + \tau_2}} \nl \ir{T-case} {\typeJC{t}{\tau_1 + \tau_2} \s \typeJ{\ctx,\hasType{x_1}{\tau_1}}{t_1}{\tau} \s \typeJ{\ctx,\hasType{x_2}{\tau_2}}{t_2}{\tau}} {\typeJC{\ttt{case} \ t \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}}{\tau}} \nl \ir{T-tfn} {\typeJ{\ctx,X}{t}{\tau}} {\typeJC{\ttt{tfn} \ X \ . \ t}{\forall X.\tau}} \s \ir{T-tapp} {\typeJC{t}{\forall X.\tau_1}} {\typeJC{t \ [\tau_2]}{\subst{X}{\tau_2}{\tau_1}}} \s \ir{T-pack} {\typeJC{t}{\subst{X}{\tau_1}{\tau_2}}} {\typeJC{\{\tau_1, t\} \ \ttt{as} \ \exists X.\tau_2}{\exists X.\tau_2}} \nl \ir{T-unpack} {\typeJC{t_1}{\exists X.\tau_1} \s \typeJ{\ctx,X,\hasType{x}{\tau_1}}{t_2}{\tau_2}} {\typeJC{\ttt{unpack} \ \{X, x\} = t_1 \ \ttt{in} \ t_2}{\tau_2}} $$There are two major additions: sum/product types and polymorphic/existential types. Typechecking the former is straightforwardly derived from the rules like from the previous assignment, but the latter introduces a new concept: type variables. Now, your typing context keeps track of two things:
- Mappings from term variables to types, notated by . This is as before used for when term variables are introduced by lambdas, or here also case expressions.
- Existence of type variables, notated by . Type variables do not map to anything, the context just tracks when a type variable is in scope (to catch type expressions that use an undefined type variable).
Concretely, you will implement the following function:
val typecheck_term : String.Set.t -> Type.t String.Map.t -> Term.t -> Type.t
Where typecheck_term type_env env t
returns the type of t
if it exists. Note that instead of using a Result.t
, this time you can use a simpler approach: whenever a type error occurs, you should raises a TypeError
exception, e.g. raise (TypError "Does not typecheck")
, with the appropriate error message. Otherwise, assume that when you call typecheck_term
recursively you get back a valid type.
Here, env
is the same mapping from variables to types as in the previous assignment, but we have added tenv
which is a set of type variables. Type variables do not map to anything (they do not themselves have types), but instead just provide the context that a type variable is in scope during typechecking of an expression.
We have provided you a function typecheck_type : String.Set.t -> Type.t -> Type.t
. Whenever you’re typechecking a term that contains a type specified by the user, for example the type annotation on a function or an injection, you will need to check that the provided type does not use unbound type variables. For example, the following expression should not typecheck because is unbound:
Lastly, during typechecking, there will be points where you will want to compare the equality of two types. Do not use the equals operator (e.g. tau1 = tau2
) as that would be incorrect! Use the Type.aequiv
function that compares two types for equality up to renaming of bound variables, or check them for alpha-equivalence. For example, the types and should be considered equal, however the =
operator will return false because the variable names are different. Type.aequiv
has the signature Type.t -> Type.t -> bool
.
Note: Technically, it’s possible for
match
statements to introduce variables that are already free variables in the expression, which is not legal. However, for the sake of this assignment, assume thatmatch
statements will always use fresh variables. That is, variables introduced in cases will not be found elsewhere in the expression.
Part 3: Interpeter
Lastly, you must implement the interpreter. At this point, all of the terms relating to type abstraction (polymorphic/existential types) have trivial dynamics, since we’ve already verified our required type properties in the typechecker. Most of the legwork is in supporting algebraic data types. The dynamics are as follows:
$$ \ir{V-n}{ \ }{\val{\num{n}}} \s \ir{V-fn}{ \ }{\val{(\ttt{fn} \ (x : \tau_1) \ . \ e)}} \s \ir{D-app$_1$}{\steps{t_1}{t_1'}}{\steps{(t_1 \ t_2)}{(t_1' \ t_2)}} \s \ir{D-app$_2$}{\val{t_1} \s \steps{t_2}{t_2'}}{\steps{(t_1 \ t_2)}{(t_1 \ t_2')}} \nl \ir{D-app$_3$}{\val{t_2}}{\steps{((\ttt{fn} \ (x : \tau) \ . \ t_1) \ t_2)}{\subst{x}{t_2}{t_1}}} \s \ir{D-binop$_1$}{\steps{t_1}{t_1'}}{\steps{t_1 \oplus t_2}{t_1' \oplus t_2}} \s \ir{D-binop$_2$}{\val{t_1} \s \steps{t_2}{t_2'}}{\steps{t_1 \oplus t_2}{t_1 \oplus t_2'}} \nl \ir{D-div}{\ }{\error{t_1 \ / \ \ttt{Int}(0)}} \s \ir{D-binop$_3$}{\ }{\steps{\ttt{Int}(n_1) \oplus \ttt{Int}(n_2)}{\ttt{Int}(n_1 \oplus n_2)}} \nl \ir{D-tuple$_1$}{\steps{t_1}{t_1'}}{\steps{(t_1, t_2)}{(t_1', t_2)}} \s \ir{D-tuple$_2$}{\val{t_1} \s \steps{t_2}{t_2'}}{\steps{(t_1, t_2)}{(t_1, t_2')}} \s \ir{D-tuple$_3$}{\val{t_1} \s \val{t_2}}{\val{(t_1, t_2)}} \nl \ir{D-project$_1$}{\steps{t}{t'}}{\steps{t.d}{t'.d}} \s \ir{D-project$_2$}{\val{(t_1, t_2)}}{\steps{(t_1, t_2).L}{t_1}} \s \ir{D-project$_3$}{\val{(t_1, t_2)}}{\steps{(t_1, t_2).R}{t_2}} \nl \ir{D-inject$_1$} {\steps{t}{t'}} {\steps{\ttt{inj} \ t = d \ \ttt{as} \ \tau}{\ttt{inj} \ t' = d \ \ttt{as} \ \tau}} \s \ir{D-inject$_2$}{\val{t}}{\val{\ttt{inj} \ t = d \ \ttt{as} \ \tau}} \nl \ir{D-case$_1$} {\steps{t}{t'}} {\steps {\ttt{case} \ t \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}} {\ttt{case} \ t' \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}}} \nl \ir{D-case$_2$} {\val{t}} {\steps {\ttt{case} \ \ttt{inj} \ t = L \ \ttt{as} \ \tau \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}} {\subst{x_1}{t}{t_1}}} \nl \ir{D-case$_3$} {\val{t}} {\steps {\ttt{case} \ \ttt{inj} \ t = R \ \ttt{as} \ \tau \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}} {\subst{x_2}{t}{t_2}}} \nl \ir{D-tfn}{\ }{\steps{\ttt{tfn} \ X \ . \ t}{t}} \s \ir{D-tapp}{\ }{\steps{t \ [\tau]}{t}} \s \ir{D-pack}{\ }{\steps{\{\tau_1, t\} \ \ttt{as} \ \tau_2}{t}} \nl \ir{D-unpack}{\ }{\steps{\ttt{unpack} \ \{X, x\} = t_1 \ \ttt{in} \ t_2}{\subst{x}{t_1}{t_2}}} $$As in the previous assignment, you will implement these dynamics in the trystep
function in interpreter.ml
:
val trystep : Term.t -> outcome
The dynamics for Int
and Var
as well as the dynamics for the polymorphic/existential cases are done for you.
Testing
To build the interpreter, run make
. This will create an executable, main.byte
. This creates OCaml bytecode that needs to passed to ocamlrun
to execute. We’ve provided two scripts user.sh
and solution.sh
for main.byte
and solution.byte
respectively.
Running run_tests.py
will run the test suite over all the files in the tests
directory and compare them to our solution.
python3 run_tests.py
Note that the tests are just a representative sample of the tests we will run your code on.
To test an individual file, you can use user.sh
to invoke the interpreter manually, e.g.
./user.sh tests/exists1.lam2
For each of the functions you have to implement, we have provided a few additional unit tests inside
of an inline_tests
function in each file. Uncomment the line that says:
let () = inline_tests ()
And then execute ./user.sh
to run those tests. If it does not fail with an assertion error, you have passed those tests.