Scheme (introduction)

Where have we been?

Impcore:

no new programming-language ideas
- programming wth assignments, etc.
- function calls
lots of new math
- operational semantics (judgements and inference rules)
- operational semantics (theory and metatheory)

Where are we going?

Scheme:

programming with recursive data structures
programming with first-class functions

For a new language, five powerful questions

As a lens for understanding, you can ask these questions about any language:

What is the abstract syntax? What are the syntactic categories and what are the terms in each category?
What are the values? What do expressions/terms evaluate to?

Small aside: Why consider values separate from abstract syntax? Abstract syntax corresponds to things that we write down in source code, while values correspond to things that are manipulated during the evaluation of a program. Sometimes these overlap; for example, a number may be both abstract syntax (a numeric literal written in source code) and a value (a machine integer manipulated during evaluation). But, there are important cases where these two concepts do not overlap. Consider objects in Java. We write new C() in source code, but at runtime we manipulate an actual object (combination of class name and instance variables). We cannot write down an object value directly. Also consider pointers in C. We can write malloc(4) in source code, but at runtime we manipulate an actual pointer. In proper C, we cannot write down a pointer value directly; instead, they only exist during the evaluation of a program.
What are the environments? What can names stand for?
How are terms evaluated? What are the judgments and inference rules?
What is in the initial basis? Primitives and otherwise, what is built in?

Introduction to Scheme

Question #2: What are the values?

Two new kinds of data:

cons cell: pointer to automatically managed (i.e., garbage collected) pair of values
function closure: first-class functions; a powerful new feature, not just a dataum

Values of Scheme

Values are S-expressions (symbolic expressions).

Simplification for now:

An S-expression is an integer literal, a boolean literal, a symbol, or a list of S-expressions.
A list of S-expressions is either '() (the empty list) or an S-expression followed by a list of S-expressions.

Like any other abstract data type

creators create new values of the type:
- 1, #t, 'a, '()
producers make new values from existing values
- (+ i j), (not b), (cons x xs)
observers examine values of the type
- number?, boolean?, symbol?, null?, pair?, car, cdr
mutators change values of the type
- none in uScheme

Lists

Lists are a subset of S-expressions.

Definition of lists of numbers:

Two ways of defining lists of numbers (as S-expressions):

\(\mathrm{IntList}\) is the smallest set satisfying

\[\mathrm{IntList} = \{ \mathtt{'()} \} \cup \{ \mathtt{(}\mathtt{cons}~a~as\mathtt{)} ~|~ a \in \mathrm{Int}, as \in \mathrm{IntList} \}\]

where \(\mathrm{Int}\) is the set of integer literal values.
\(z \in \mathrm{IntList}\) is a judgement defined by the inference rules

\[\frac{ }{ \mathtt{'()} \in \mathrm{IntList} }~(\mathit{Empty}) \quad\quad\quad \frac{ a \in \mathrm{Int} \quad as \in \mathrm{IntList} }{ \mathtt{(}\mathtt{cons}~a~as\mathtt{)} \in \mathrm{IntList} }~(\mathit{Cons})\]

Definition of lists:

More generally, two ways of defining lists of \(A\)s, where \(A\) is some other well-defined set of values:

\(\mathrm{List}(A)\) is the smallest set satisfying

\[\mathrm{List}(A) = \{ \mathtt{'()} \} \cup \{ \mathtt{(}\mathtt{cons}~a~as\mathtt{)} ~|~ a \in A, as \in \mathrm{List}(A) \}\]
\(z \in \mathrm{List}(A)\) is a judgement defined by the inference rules

\[\frac{ }{ \mathtt{'()} \in \mathrm{List}(A) }~(\mathit{Empty}) \quad\quad\quad \frac{ a \in A \quad as \in \mathrm{List}(A) }{ \mathtt{(}\mathtt{cons}~a~as\mathtt{)} \in \mathrm{List}(A) }~(\mathit{Cons})\]

Lists as an abstract datatype

creators/producers: '(), (cons x xs)
observers: null?, pair?, car, cdr (also known as "first"/"rest" and "head"/"tail", and many other names)
algebraic laws
- (null? '()) == #t
- (null? (cons v vs)) == #f
- (pair? '()) == #f
- (pair? (cons v vs)) == #t
- (car (cons v vs)) == v
- (cdr (cons v vs)) == vs

Why are lists useful?

Sequences are a frequently used abstraction
Can easily approximate a set
Can implement finite maps with association lists (aka dictionaries)
You don’t have to manage memory

These "cheap and cheerful" representations are less efficient than balanced search trees, but are very easy to implement and work with; book has many examples.

The only thing new here is automatic memory management. Everything else you could do in C. (You can have automatic memory management in C as well.)

Recursive functions on lists

Lists are inductively defined; lists are recursively processed.

Any list is constructed with either '() or cons.

What observers allow you to tell the difference?

`length`

(define length (xs)
   (if (null? xs) 0
       (+ 1 (length (cdr xs)))))

-> (length '(1 2 3 4))
4
-> (length '(1 (2 3 (4 5 6) 7 8) 9))
3

`total-length`

Note that the length defined above only counts the number of elements in the outermost list structure. If one element of the outermost list is itself a list, then it only counts as 1 towards the length.

What if we wanted the total length of a list, counting not just the number of elements in the outermost list structure, but also the number of elements of lists that are themselves elements of other lists?

(define total-length (xs)
   (if (null? xs) 0
       (if (list? (car xs))
           (+ (total-length (car cs)) (total-length (cdr xs)))
           (+ 1 (total-length (cdr xs))))))

`append`

Consider the algebraic laws that we want append to satisfy. We use informal "math" notation with .. for "followed by" and e for the empty sequence:

xs .. e == xs
e .. ys == ys
(x .. xs) .. ys == x .. (xs .. ys)
xs .. (y .. ys) == (xs .. y) .. ys

Some of these .. correspond to append (of two lists, xs .. ys), some correspond to cons (of an element and a list, x .. xs), and some correspond to snoc (of a list and an element, xs .. y).

But, we have no snoc; strike the last law
The first law is extraneous, since the second and third laws are complete case analysis on the first argument.

Use the second and third laws to guide the Scheme implementation:

(define append (xs ys)
   (if (null? xs) ys
       (cons (car xs) (append (cdr xs) ys))))

The dominant cost is cons (i.e., allocation of a new list element).

How many cons cells are allocated by (append xs ys), in terms of the lengths of xs and ys?

naive `reverse`

Consider the algebraic laws that we want reverse to satisfy:

reverse e == e
reverse (x .. xs) = (reverse xs) .. x

Some correspond to cons (of an element and a list, x .. xs), and some correspond to snoc (of a list and an element, (reverse xs) .. x).

We can define snoc in terms of append: (define snoc (xs x) (append xs (list1 x)))

Use the two laws to guide the Scheme implementation:

(define reverse (xs)
   (if (null? xs) '()
       (append (reverse (cdr xs)) (list1 (car xs)))))

How many cons cells are allocated by (reverse xs), in terms of the lengths of xs?

accumulating `reverse`

Consider a different set of algebraic laws that we want reverse to satisfy:

reverse e .. zs == zs
reverse (x .. xs) .. zs == (reverse xs) .. (x .. zs)

Some of these .. correspond to append (of two lists) and some correspond to cons (of an element and a list).

(define revapp (xs zs)
   (if (null? xs) zs
       (revapp (cdr xs) (cons (car xs) zs))))
(define reverse (xs)
   (revapp xs '()))

Parameter zs is the accumulating parameter. (A powerful, general technique.)

How many cons cells are allocated by (reverse xs), in terms of the lengths of xs?

Algebraic Laws, Equational Reasoning, and Calculational Proofs

One might question whether the accumulating reverse is really the same function as the naive reverse. We can use equational reasoning to prove that the two functions really are equivalent. Equational reasoning is a simple, but powerful, proof technique that only requires expanding (or contracting) the definitions of functions and substituting equals for equal. When applied to a recursive structure like lists, the proofs are by structural induction. Structural induction simply requires proving the algebraic law for the empty list (the base case) and, assuming that the algebraic law hold for the list zs (the induction hypothesis), proving that the law holds for (cons z zs) (the step case).

The key to proving that the accumulating reverse is equivalent to the naive reverse is proving that revapp is equivalent to the append of the (naive) reverse of the first argument and the second argument. Thus, we prove

Theorem: (revapp xs zs) == (append (reverse_naive xs) zs)

Proof: by structural induction on the list xs and equational reasoning

Case xs == '()

(revapp xs zs)
= { xs == '() }
(revapp '() zs)
= { defn of revapp }
(if (null? '()) zs (revapp (cdr '()) (cons (car '()) zs)))
= { null?-empty law }
(if #t zs (revapp (cdr '()) (cons (car '()) zs)))
= { if-#t law }
zs
= { if-#t law }
(if #t zs (cons (car '()) (append (cdr '()) zs)))
= { null?-empty law }
(if (null? '()) zs (cons (car '()) (append (cdr '()) zs)))
= { defn of append }
(append '() zs)
= { if-#t law }
(append (if #t '() (append (reverse_naive (cdr '())) (cons (car '()) '()))) zs)
= { null?-empty law }
(append (if (null? '()) '() (append (reverse_naive (cdr '())) (cons (car '()) '()))) zs)
= {defn of reverse_naive }
(append (reverse_naive '()) zs)
= { xs == '() }
(append (reverse_naive xs) zs)

Case xs == (cons a as) with IH (revapp as bs) == (append (reverse_naive as) bs) (for any bs)

(revapp xs zs)
= { xs == (cons a s) }
(revapp (cons a as) zs)
= { defn of revapp }
(if (null? (cons a as)) zs (revapp (cdr (cons a as)) (cons (car (cons a as)) zs)))
= { null?-cons law }
(if #f zs (revapp (cdr (cons a as)) (cons (car (cons a as)) zs)))
= { if-#f law }
(revapp (cdr (cons a as)) (cons (car (cons a as)) zs))
= { cdr-cons law }
(revapp as (cons (car (cons a as)) zs))
= { car-cons law }
(revapp as (cons a zs))
= { IH }
(append (reverse_naive as) (cons a zs))
= { append-sing-left-law (proved in PL:BPC, p. 115) }
(append (reverse_naive as) (append (cons a '()) zs))
= { append-associative law (assumed below) }
(append (append (reverse_naive as) (cons a '())) zs)
= { car-cons law }
(append (append (reverse_naive as) (cons (car (cons a as)) '())) zs)
= { cdr-cons law }
(append (append (reverse_naive (cdr (cons a as))) (cons (car (cons a as)) '())) zs)
= { if-#f law }
(append (if #f '() (append (reverse_naive (cdr (cons a as))) (cons (car (cons a as)) '()))) zs)
= { null?-cons law }
(append (if (null? (cons a as)) '() (append (reverse_naive (cdr (cons a as))) (cons (car (cons a as)) '()))) zs)
= {defn of reverse_naive }
(append (reverse_naive (cons a as)) zs)
= { xs == (cons a as) }
(append (reverse_naive xs) zs)

Note that this proof assumes that append is associative, which we leave as an exercise for the reader.

Theorem: (append xs (append ys zs)) == (append (append xs ys) zs)

Proof: by structural induction on the list xs

Our next proof will assume that append of the empty list on the right is the identity:

Theorem: (append xs '()) == xs

Proof: by structural induction on the list xs and equational reasoning

Case xs == '()

(append xs '())
= { xs == '() }
(append '() '())
= { denf of append }
(if (null? '()) '() (cons (car '()) (append (cdr '()) '())))
= { null?-empty law }
(if #t '() (cons (car '()) (append (cdr '()) '())))
= { if-#t law }
'()

Case xs == (cons a as) with IH (append as '()) == as

(append xs '())
= { xs == (cons a as) }
(append (cons a as) '())
= { denf of append }
(if (null? (cons a as)) '() (cons (car (cons a as)) (append (cdr (cons a as)) '())))
= { null?-cons law }
(if #f '() (cons (car (cons a as)) (append (cdr (cons a as)) '())))
= { if-#f law }
(cons (car (cons a as)) (append (cdr (cons a as)) '()))
= { car-cons law }
(cons a (append (cdr (cons a as)) '()))
= { cdr-cons law }
(cons a (append as '()))
= { IH }
(cons a as)
= { xs == (cons a as) }
xs

Finally, we can complete our argument that reverse_accum and reverse_naive are equivalent:

Theorem: (reverse_accum xs) == (reverse_naive xs)

Proof: by equational reasoning

(reverse_accum xs)
= {defn of reverse_accum }
(revapp xs '())
= { revapp-specification law (proved above) }
(append (reverse_naive xs) '())
= { append-empty-right law (assumed above) }
(reverse_naive xs)

More Truth about S-expressions

Correcting our simplification of S-expressions.

An S-expression is an integer literal, a boolean literal, a symbol, the empty list or a pair of two S-expressions.

A cons can pair any two values, not just an element and a list.

a "list" might have elements of different types: (cons 1 (cons #t (cons 'a '())))
a "cons" need not have a list as its second element: (cons 1 2)

A proper list is either the empty list or a pair whose second element is a proper list.

Definition of S-expressions:

Two ways of defining S-expressions:

\(\mathrm{Atom}\) and \(\mathrm{SExp}\) are the smallest sets satisfying

\[\begin{array}{l} \mathrm{Atom} = \mathrm{Num} \cup \mathrm{Bool} \cup \mathrm{Sym} \cup \{ \mathtt{'()} \} \\ \mathrm{SExp} = \mathrm{Atom} \cup \{ \mathtt{(}\mathtt{cons}~v_1~v_2\mathtt{)} ~|~ v_1 \in \mathrm{SExp}, v_2 \in \mathrm{SExp} \} \end{array}\]
\(z \in \mathrm{Atom}\) and \(z \in \mathrm{SExp}\) are judgements defined by the inference rules

\[\begin{array}{c} \frac{ z \in \mathrm{Num} }{ z \in \mathrm{Atom} } \quad\quad\quad \frac{ z \in \mathrm{Bool} }{ z \in \mathrm{Atom} } \quad\quad\quad \frac{ z \in \mathrm{Sym} }{ z \in \mathrm{Atom} } \quad\quad\quad \frac{ }{ \mathtt{'()} \in \mathrm{Atom} } \\ \frac{ z \in \mathrm{Atom} }{ z \in \mathrm{SExp} } \quad\quad\quad \frac{ v_1 \in \mathrm{SExp} \quad v_2 \in \mathrm{SExp} }{ \mathtt{(}\mathtt{cons}~v_1~v_2\mathtt{)} \in \mathrm{SExp} } \end{array}\]

Structural Equality of S-expressions:

uScheme provides a primitive = that works on numbers, booleans, symbols, and the empty list, but never cons cells. It is only useful for comparing atoms

Define equal?, which will identify isomorphic S-expressions, including lists as a special case.

(define atom? (x) (or (number? x) (or (symbol? x) (or (boolean? x) (null? x)))))
(define equal? (s1 s2)
   (if (or (atom? s1) (atom? s2))
       (= s1 s2)
       (if (and (pair? s1) (pair? s2))
           (and (equal? (car s1) (car s2)) (equal? (cdr s1) (cdr s2)))
           #f)))