This lab will introduce you to the logistics of programming in OCaml. To get the starter code for this lab, go to the assignments repository and run git pull, then go into the lab1 directory.

Installation

First, you need to get OCaml installed on your computer.

1. Install the binaries

You will install OPAM, the package manager for OCaml.

  • OS X

    brew install opam
    
  • Ubuntu 16.04

    sudo apt-get install opam -y
    
  • Windows 10

    Get Bash on Ubuntu on Windows using these instructions then follow the Ubuntu guide.

  • Rice

    OCaml and OPAM are already installed with the correct versions on the Rice machines if you followed the Rice setup guide.

2. Install OCaml packages

We use OPAM to manage OCaml packages and install new OCaml versions.

opam init -y
opam switch 4.06.1
eval `opam config env`
opam install core menhir utop merlin ocp-indent user-setup -y

NOTE: If opam switch 4.06.1 doesn’t work and you get a message asking you to run opam switch create 4.06.1 instead, do that.

This step will take a while!

3. Configure your editor

The most crucial component of the OCaml development toolchain is merlin, a tool that drives many IDE features like getting the types of variables, jumping to symbols, showing syntax errors, and so on. It will make your life much easier, so you really should get it installed.

4. Setup persistent OCaml env

Add to your .bashrc (or equivalent)

eval `opam config env`
  • Vim and Emacs

    Configuring Merlin for Vim and Emacs is simple:

    opam user-setup install
    
  • Anything else

    Any other editor (Atom, VSCode, Sublime) requires more work. See the Merlin wiki page for installation instructions.

Running code

There are two main ways of running OCaml. It has a basic REPL (ocaml on the command line) and a more fully-featured REPL (utop on the command line). For example, try this:

$ utop
──────────────────────────┬─────────────────────────────────────────────────────────────┬───────────────────────────
                          │ Welcome to utop version 2.2.0 (using OCaml version 4.06.1)! │
                          └─────────────────────────────────────────────────────────────┘

...

─( 22:37:04 )─< command 0 >──────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # let x = 1 + 1 ;;
val x : int = 2

Note that at the REPL, you have to add the double semicolon ;; after each expression for OCaml to evaluate it.

OCaml also has a compiler. If you create a file test.ml that contains:

let () = Printf.printf "test: %d\n" 0

And then run ocamlbuild test.native, this will create an executable test.native that runs the program. Generally we will provide you a Makefile that wraps ocamlbuild to ensure you have the appropriate dependencies.

Expressions

OCaml is basically an extended version of the typed lambda calculus. I pointed out in class that a lambda calculus expression like:

Is almost exactly the same as the equivalent OCaml expression:

(fun (x : int) -> x + 1) 2

OCaml has functions (just fun instead of ) and integers. It also has a few useful built-in types like string.

let s1 : string = "Hello " in       (* string literals *)
let s2 : string = s1 ^ " world!" in (* string concatenation *)
Printf.printf "%s" s2               (* string formatting *)

A few differences here from the lambda calculus. OCaml has “let” expressions (exactly like you saw on Assignment 1). Note that in a traditional programming language, variable bindings don’t “have a value.” For example, in C, if I write:

int x = 1;
int y = 2;
return x + y;

Then these would be imperative “statements” that have side effects, i.e. they affect some state not explicit in the program. By contrast, lets in OCaml are expressions:

let x = 1 in
let y = 2 in
x + y

The above is an expression that evaluates to the number 3. Note that OCaml features type inference, i.e. we don’t have to explicitly annotate every type (e.g. let x = 1 instead of let x : int = 1).

While OCaml is still (mostly) immutable, i.e. you can’t change the value of a variable like in imperative programming, you can emulate mutable variables by repeated bindings:

let x = 1 in
let x = x + 1 in
x

Sometimes it’s still useful to have side effects, most notably printing strings to STDOUT. Functions with side effects, like Printf.printf, return the unit type (). We can use the semicolon operator to interleave prints in our code:

let x = 1 in
Printf.printf "%d" x;
let y = 2 in
x + y

Note: the semicolon is an infix operator that takes two expressions, executes the left one, discards its result, and then evalutes to the right one.

To create functions, we have a few options. Here are several ways to implement the same add function:

(* Most explicit, anonymous function syntax *)
let add : int -> int -> int = fun (m : int) -> fun (n : int) -> m + n in

(* Example usage. Note that like in the lambda calculus, we can partially
 * apply our function to get a new one that remembers the original argument. *)
let add1 = add 1 in
Printf.printf "%d" (add1 2);

(* Elide the argument types *)
let add = fun m -> fun n -> m + n in

(* Functions with multiple arguments are automatically converted to multiple
 * functions. *)
let add = fun m n -> m + n in

(* Most explicit, named function syntax. Note that we specify a return type *)
let add (m : int) (n : int) : int = m + n in

(* Elide the argument and return type *)
let add m n = m + n in

()

Lastly, note that to create a recursive function, you have to use the special let rec syntax:

let rec sum n =
  if n = 0 then 0
  else n + (sum (n - 1))

Exercises, part 1

Now it’s time to apply the knowledge in the above section. In the lab1 directory, we have provided you a file exercise1.ml that contains starter code for the following exercises. To run it, type make and then ./exercise1.native.

First, implement a function gcd m n that returns the greatest common denominator between two numbers. Translate it from the corresponding C implementation:

int gcd(int a, int b) {
  while (b != 0) {
    int t = b;
    b = a % b;
    a = t;
  }
  return a;
}

A key part of the functional programming paradigm is understanding how to map imperative code (like for loops) to functional concepts (like recursion). Don’t try to fit a square peg into a round hole.

Second, implement a function fizz_buzz n that iterates from 0 to n, and for each number, prints “fizz” if it’s divisible by 3, “buzz” if by 5, and “fizzbuzz” if by both 3 and 5. Again, the imperative C code:

void fizz_buzz(int n) {
  for (int i = 0; i < n; ++i) {
    if (i % 3 == 0 && i % 5 == 0) {
      printf("fizzbuzz\n");
    } else if (i % 3 == 0) {
      printf("fizz\n");
    } else if (i % 5 == 0) {
      printf("buzz\n");
    }
  }
}

Third, implement a function read_password password that continually reads lines from stdin until the provided password is read.

Hint: you will want to use the provided function read_line : unit -> string.

Challenge: implement a function substring_match pattern source that returns the first index in the string source where the substring pattern occurs (or None otherwise). You will need the following functions:

val String.length : string -> int
val String.slice : string -> int -> int -> string

Data structures

Now that we’ve seen how to use the basic data types (ints, strings, functions), we’re going to move on to composite data types (algebraic data types, as discussed in class). As mentioned in the last lecture, ADTs can come in both named and anonymous forms. For example, OCaml has both tuples and records (product types):

type point = { x : int; y : int; z : int }
let p : int * int * int = (1, 2, 3) in (* tuple *)
let () =
  (* can only access tuple fields through pattern matching *)
  let (x, y, z) = p in
  Printf.printf "%d %d %d" x y z
in
let p : point = { x = 1; y = 2; z = 3; } in (* record *)
Printf.printf "%d %d %d" p.x p.y p.z

A critical part of dealing with composite data structures is pattern matching. The match statement in OCaml is amazing and wonderful and powerful, much more so than the statement you saw in lecture.

let point_pair = ({x=0; y=0; z=0}, {x=1; y=2; z=3}) in
match point_pair with
| ({x = 0; y; _}, p2) -> y + p2.x           (* pattern match nested structures, ignore fields *)
| (p1, p2) when p1.x < p2.x -> p2.x - p1.x  (* use guards for complex conditions *)
| _ -> assert false                         (* wildcard for catch-all *)

Unlike our typed lambda calculus, sum types are never anonymous, only named. They’re usually constructed explicitly through a type declaration:

type sum = A | B of int

let a : sum = A in
let b : sum = B 3 in
match (a, b) with
| (A, B n) -> n
| (B n1, B n2) -> n1 + n2
| _ -> assert false

Sum types (or variants in OCaml lingo) can also be recursive (we will cover this formally in greater depth next week). For example, a binary tree:

type tree = Leaf | Node of tree * int * tree

(*   2
 *  / \
 * 3   1 *)
let t = Node(Node(Leaf, 3, Leaf), 2, Node(Leaf, 1, Leaf)) in
match t with
| Node (l, n, r) -> Printf.printf "%d" n
| Leaf -> Printf.printf "Leaf"

When implementing data structures, we often want to wrap up their functionality into a module. Modules are versatile things in OCaml that perform the function of namespaces (creating hierarchy when naming things) as well as interfaces (separating specification from implementation). For example, a binary tree module will look like:

(* Signatures (or interfaces) declare the types of things in a module.
 * The "val" keyword indicates that this is a declaration of a thing that will
 * exist, but doesn't currently. *)
module type IntTree = sig
  type tree = Leaf | Node of tree * int * tree
  val size : tree -> int
end

(* Modules package functionality together, and optionally implement an interface.
 * Here, the IntTreeImpl : IntTree syntax means "this module implements the
 * IntTree interface." *)
module IntTreeImpl : IntTree = struct
  type tree = Leaf | Node of tree * int * tree
  let rec size t =
    match t with
    | Leaf -> 0
    | Node (l, _, r) -> 1 + (size l) + (size r)
end

(* IntTreeTests is a functor, or a module parameterized by another module. It
 * asks for a module that implements the IntTree signature, but doesn't need to
 * know how it's implemented. *)
module IntTreeTests(T : IntTree) = struct
  let t : T.tree = T.Node(T.Leaf, 2, T.Leaf) in
  assert (T.size t = 1)
end

(* This instantiates the IntTreeTests functor for our specific implementation of
 * IntTree. *)
module Tests = IntTreeTests(IntTreeImpl)

Above, we defined our own modules, but you will often use modules other people have written. For example, we use the Jane Street Core standard library replacement. It comes with a lot of functionality, and today we’ll look at one example you’ll use on the homework.

Note: in general, the Core documentation (like many OCaml libraries) is a little confusing, so we’ll do our best to provide the necessary pointers to understand how to use Core functions.

The Map module implements a data structure that maps keys to values. The String.Map module is a version of the map specialized to string keys. Using a map looks like:

open Core (* Imports all Core functions into our current module *)

(* Create an empty map *)
let map : int String.Map.t = String.Map.empty in

(* Add a value to the map. Like most data structures in OCaml, this map is
 * purely functional, i.e. adding a new key returns a new map. *)
let map = String.Map.set ~key:"x" ~data:0 map in

(* Looking up keys in the map returns an option. *)
let print_value (k : string) : unit =
  match String.Map.find map k with
  | None -> Printf.printf "String %s not in map\n" k
  | Some v -> Printf.printf "%s -> %d\n" k v
in

print_value "x";  (* prints "x -> 0" *)
print_value "y"   (* prints "String y not in map" *)

Exercises, part 2

Your next few tasks are to fill out the remaining parts of an implementation for the IntTreeImpl module in exercise2.ml. Specifically, you implement the following functions:

val map : (int -> int) -> tree -> tree

map f t takes a function and maps it over all the values in the tree. For example, map (fun x -> x + 1) t should return a tree with each node value incremented by one.

val to_string : tree -> string

to_string t takes a tree and convert it into a string. You should use the convention that a leaf is "()" and a node is “(left, value, right)” (see the test cases for an example). You will find the function Printf.sprintf useful.

val binary_search : tree -> int -> int option

binary_search t n takes as input a tree that is assumed to be a valid binary search tree, i.e. at each node Node(l, x, r), you can assume that all node values in l are less than or equal to x, and all node values in r are greater than or equal to x. The binary search function should return the greatest value less than or equal to n in the tree, or None if no such value exists.

Programming patterns

I want to briefly highlight a few useful programming patterns/idioms that show up in OCaml (and functional programming more broadly).

The first is partial application. Like the lambda calculus, all functions in OCaml have one input and one output. A function can choose to have multiple inputs using a tuple, but it isn’t required:

let div1 m n = m / n in
let div2 (m, n) = m / n in
(div1 3 2) + (div2 (6, 2))

When functions take tuple as inputs, they require all the arguments to be present at the same time in the function call. By contrast, functions that use the non-tuple style don’t have this requirement. You can give a function some of its arguments, but not all, and get back a function that remembers the inputs while waiting for more.

let add m n = m + n in
let add2 = add 2 in
assert(add2 3 = 5)

This style of programming with functions is called partial application, or “currying” (after the mathematician Haskell Curry). It’s particularly useful with higher-order functions, or functions that take other functions as input (e.g. map). You’ll see more of this on the third assignment.

The second pattern is monads. A common issue when dealing with sum types is nested matches. For example, let’s say we repeatedly apply our safe division function:

let div (m : int) (n : int) : int option =
  if n = 0 then None
  else Some (m / n)
in

match div 100 3 with
| Some n ->
  (match div n 5 with
   | Some n -> div n 2
   | None -> None)
| None -> None

For long chains of error-prone operations, this can get incredibly verbose. Instead, we can use a particular kind of programming idiom to simplify this logic.

let bind (opt : int option) (f : int -> int option) : int option =
  match opt with
  | None -> None
  | Some x -> f x
in
bind
  (div 100 3)
  (fun n -> bind
    (div n 5)
    (fun n -> (div n 2)))

This “bind” function essentially says: take as input an option, and then a function that we run on the option if it is a Some, returning a new option. (For theoretical reasons beyond the scope of this tutorial, this is part of a “monadic” interface, but for now just roll with it as a programming pattern.) This is still a little verbose, but if we change bind to an infix operator >>=:

let bind (opt : int option) (f : int -> int option) : int option =
  match opt with
  | None -> None
  | Some x -> f x
in

(* Parentheses indicate a custom infix operator *)
let (>>=) = bind in

div 100 3
>>= fun n ->
div n 5
>>= fun n ->
div n 2

When used as an infix operator, the bind function allowus to linearize our nested matches which produces more readable code.

Exercises, part 3

To apply all the skills you have learned thus far, you will get a primer for the homework by implementing a type checker and interpreter for an extended arithmetic language of numbers and booleans. The language specification is:

In exercise3.ml, you need to implement the typecheck and trystep functions. We have filled in a few branches for you—you do the rest!