Object-Orientation and Smalltalk (Part 1)

Our next topic is "Object-Orientation and Smalltalk". While you are familiar with object-oriented languages through your experiences with Java, beware that we will be investigating object-orientation in a very pure form (of which Java is not).

Preliminaries

Up to now, we have mostly been looking at "programming in the small", but using expressive features.

By "programming in the small", we mean that we have mostly been concerned with individual and independent functions. Consider most of the problems in the Impcore, uScheme, and Standard ML programming assignments; there were few dependencies between one problem and another. The largest body of code that we have considered has been a list "library" comprised of length, map, filter, exists?, all?, foldr, and foldl (and, excepting the fact that most of the functions can be written using either foldr or foldl, these functions are independent of one another).

For expressive features, we have looked at:

first-class functions and closures
algebraic datatypes and pattern matching
polymorphic type systems

What about "programming in the large"? By "programming in the large", we mean large scale software development, leading to programs like word processors, web browsers, web servers, and operating systems. Such programs are comprised of many interacting components and are developed by a team of programmers, rather than an individual. In order to support programming in the large, we want (and need) encapsulation --- the ability to package some significant body of code as a unit that can be used and reused. (We will also need a bigger basis or standard library.) Encapsulation is good for both the code producer and the code consumer. Unfortunately, while there is great agreement about the need for encapsulation, there is a great divide about how to provide encapsulation.

In one direction, we have modules. We won’t discuss modules much in this course (although they are an important contribution of Standard ML and similar languages). Suffice to say that when code is packaged as a module, the code consumer is able to know the specification of the code but has no ability to change the behavior of the code. This is both good and bad. The good is that it enables modular reasoning; the code consumer does not need to know anything about the implementation (other than its advertised behavioral specification). One can verify that one module (that uses another module) is correct without needing to see the implementation of the other module. The bad is that if a particular module’s behavior isn’t quite what is required, then there is no recourse other than to implement a new module with the desired behavior. In essence, by not knowing anything about the module’s implementation, it is possible to use the module (and know with certainty that the module won’t ever behave differently), but difficult to reuse the module to obtain a related, but slightly different, implementation.

In the other direction, we have objects. When code is packaged as an object, the code consumer knows both the interface/specification of the code and also has ability to change the behavior of the code. This reuse is facilitated by two (distinct!) features:

Inheritance, which enables the reuse of implementation
Subtyping, which enables the reuse of interfaces

Note that these two features are often conflated in object-oriented languages, but can be considered distinct. As you know from your experience with Java, when code is packaged in as an object, it is possible to change the behavior of the code by subclassing and overriding methods. Again, this is good and bad. The good is that if a particular class’s behavior isn’t quite what is required, then one can subclass and instantiate to obtain an object with the desired behavior; this inheritance reuses the implementation of methods. This requires the code consumer to have some knowledge about the implementation --- in particular, one needs to know what and how methods are used by other methods in order to know which and how to override. The bad is that it can be difficult to know for certain how a particular object will behave. Because of subtyping (reuse of interfaces), an object that is an instance of a subclass can be used where an object that is an instance of the superclass is expected; but, the behavior of the subclass’s object may be different from that of the superclass.

All About Objects

What is an object?

With closures, we have seen that mixing code and data can create powerful abstractions.
Objects are another way to mix code and data
Objects combine:
- some mutable state (referred to as instance variables)
- code that responds to messages (referred to as methods)
  
  There are two important aspects of methods:
  - How are they defined? by inheritance
  - How is a specific method (i.e., code) chosen to respond to message? by dynamic dispatch

What are objects good for?

Not especially useful for building small programs. [Think of the HelloWorld.java program, where we must introduce a class (extending some superclass), just to define a static main method.]
Instead, good for building a big, full-featured abstraction, from which one can build other similar abstractions via inheritance.
Thus, objects are good at for adding new kinds of things that behave similarly to existing things. This makes objects good for programs that are evolving.
Objects are especially good at supporting a particular kind of evolution:

the set of operations stay the same, but new kinds of things with those operations are added
- Graphical User Interface (GUI) elements
  
  Operations are draw and onClick: every GUI element must draw itself on the screen and respond to mouse clicks.
  
  Lots of different kinds of GUI elements: buttons, check boxes, sliders, …
- Numbers
  
  Operations are mathematical operations (e.g., +, -, abs, sqrt, …)
  
  Lots of different kinds of numbers: integers, rationals, floating-point, arbitrary precision, complex, …
- Collections
  
  Operations include add, remove, do/foreach (i.e., iterate through elements of collection), …
  
  Lots of different kinds of collections: lists, arrays, sets, bags, …

What’s hard about objects?

If you do anything at all interesting, the control-flow becomes smeared out over half-a-dozen classes and algorithms are nearly impossible to understand.

The larger a class hierarchy becomes, the harder it becomes to ensure that all methods make sense for all classes.

In greater detail:

An object has two parts:
- Instance Variables: state of the object
- Methods: code that can access and manipulate state
An object provides data encapsulation
- No object can (directly) get or set the instance variables of any other object.
- Instance variables of an object can only be gotten or set by methods of the object.
- This design protects the integrity of the data in the object
Note: This design is an aspect of Smalltalk that is not shared by all object-oriented languages. Some OO languages allow public instance variables that can be accessed by any method (although this can often simply be considered uses of implicitly defined getter and setter methods).
An object interacts with other objects via message passing
- A sender object sends a message (with arguments) to a receiver object
- The sender object blocks until the message returns
- The receiver object responds to the message by executing a method and returning a result to the sender
To design an object, we must decide the
- Data: what are the instance variables for the object
- Code: what are the methods that will manipulate the data
But, designing (and implementing) objects one by one can be cumbersome. And, many objects in a program are similar (if not identical, but for the particular values of their instance variables at a particular point in time).
A class is a special kind of object
- Provides a "template" for object construction:
  
  What are the instance variables (data) and methods (code) for objects
  
  How to initialize instance variables (via constructors)
- Because a class is itself an object, it may have class variables (the instance variables of the class itself) and class methods (the methods of the class itself).
Note: In Smalltalk, a constructor is typically provided as a class method, although supported by object methods.

Smalltalk

Why study Smalltalk?

Designed by Alan Kay, another Turing Award winner
A small, simple, and pure object-oriented language

By small and simple, we mean that almost the complete language can be implemented in a relatively small interpreter.

By pure, we mean that everything is an object.

Smalltalk lives on:

Squeak: a modern, open-source, Smalltalk programming system
Pharo: a mininal, elegant, pure, reflective object language
Ruby: Smalltalk semantics (but not syntax) at its core
Objective-C: adds Smalltalk-style message passing to C and used/promoted by Apple for macOS and iOS applications
- Swift: a successor language of Objective-C now used/promoted by Apple for macOS and iOS applications, but retains a significant Smalltalk influence.

The Six Questions:

What are the values?

Values are objects.

all values are objects
- even literals, which are often distinct varieties of values in other languages, are all objects in Smalltalk
  - 3 is an object
  - true is an object
  - 'up is an object
- even classes are objects
- there are no function values (i.e., no closure values) --- only methods of objects
  - but, a block object acts very much like a closure

1st Smalltalk slogan: Every value is an object!

What are the environments? What do names stand for?

A name stands for a mutable cell containing a value (which is necessarily an object); so, like uScheme, there are both environments (mapping names to locations) and a store (mapping locations to values (which are necessarily objects)).

There are four kinds of variables:

global variables
instance variables
formal parameters (of a method)
local variables (of a method; similar to Programming02: Impcore with Local Variables)

In the operational semantics (which we are not covering, though you are welcome to investigate it on your own), global variables are tracked by one environment and all other kinds of variables are tracked by a separate environment.

How are terms evaluated?

Evaluation of variables, set, and begin is similar to Impcore.

Function calls and other control flow (if, while) are replaced by message send, which uses dynamic dispatch.

Dynamic dispatch should be familiar from Java: When a message is sent to an object (i.e., when a method of an object is invoked), look for the code in the class that the object is an instance of. If the code is not found in the object’s class, look in the object’s super-class; if the code is not found in the object’s super-class, look in the object’s super-super-class; …

What is in the initial basis?

The initial basis of uSmalltalk is enormous (as compared to Impcore and uScheme), containing many predefined classes and a few primitive classes.

Why is the initial basis so large? In order to demonstrate the benefits of reuse, we need something interesting enough to reuse.

Smalltalk-80 "blue book" devotes about 90 pages on the language definition (syntax and semantics), but about 300 pages on the standard library.

What are the types?

None --- uSmalltalk is dynamically typed (like uScheme).

Smalltalk uses the terms protocol and behavioral subtyping.

A protocol is collection of message names and their behaviors. In Smalltalk, it is an informal/documentation thing, not something with syntax/semantics in the language. It is often the case that a protocol is associated with a particular (often abstract) class and its subclasses. But, two unrelated classes (neither class is a subclass of the other) can implement the same protocol (much as two unrelated classes in Java can implement the same interface).

Behavioral subtyping simply means that when we program with respect to a protocol, our code will be usable with any object that implements the protocol, regardless of the actual class of the object.

Ruby programmers use the term duck typing: If it walks like a duck (i.e., responds to the walk message), swims like a duck (responds to the swim message), and quacks like a duck (responds to the quack message), then it must be a duck (implements the Duck protocol).

Smalltalk does have one "compile-time" checking feature. A message’s name encodes its arity (expected number of arguments); therefore, the interpreter can check that a message send is performed with the correct number of arguments.

What is the syntax?

Syntax is best described as a revision of the Impcore syntax:

Impcore Syntax

Exp = LITERAL of value
    | VAR of name
    | SET of name * exp
    | IF of exp * exp * exp
    | WHILE of exp * exp
    | BEGIN of exp list
    | APPLY of name * exp list

uSmalltalk Syntax

Exp = LITERAL of rep
    | VAR of name
    | SET of name * exp
    | IF of exp * exp * exp (2)
    | WHILE of exp * exp (2)
    | BEGIN of exp list
    | APPLY of name * exp list   (1)
    | SEND of name * exp * exp list       (1)
    | BLOCK of name list * exp list       (3)
    | RETURN of exp                       (4)

We continue to have mutable variables (VAR and SET) and sequencing (BEGIN). Although, you will rarely see (begin …) in a uSmalltalk program: because almost every method in a uSmalltalk program will need to perform a sequence of operations, the syntax for methods and blocks uses a list of expressions for the body, rather than a single expression.

1	The most import difference is that function calls (`APPLY`) in Impcore are replaced by message send (`SEND`) in uSmalltalk. First, note that `SEND` uses a `name` for the message; a message is always identified by name --- the name of a message is not a value. Second, note that a message is always sent to an object, known as the receiver; that is the single `exp` (which will be evaluated to an object) in the `SEND` abstract syntax. Finally, a message is sent with a list (possibly empty) of arguments; those are the `exp list` (which will be evaluated to a list of objects). As noted above, a message’s name encodes its arity. A symbolic name (one that does not begin with a letter, like `\+`) indicates a method of arity 1 (expecting one argument in addition to the receiver); thus, we write `(1 + 2)` to send the `+` message to the object `1` with one argument, the object `2`. A non-symbolic name (one that does begin with a letter, like `at:`) indicates a method with arity equal to the number of colons in the name; thus, we write `(arr at: 0)` to send the `at:` message to the object denoted by `arr` with one argument, the object `0`.
2	The next most important difference is that uSmalltalk eliminates the `IF` and `WHILE` expressions. How then do we encode conditionals and loops? Although some loops can be handled by recursion (much as we never needed to use `while` in uScheme), we still desire support for conditionals and loops. In Smalltalk, this is accomplished by using blocks (special kinds of objects that represent suspend code, very similar to `lambda` abstractions in uScheme); in particular, we send block objects to boolean objects (for conditionals) and to block objects (for loops). 2nd Smalltalk slogan: Control structures are implemented by sending messages.
3	Yet another important difference is that uSmalltalk includes an expression for creating a block object. Note that the abstract syntax for `BLOCK` in uSmalltalk is very similar to that of `LAMBDA` in uScheme: a list of formal parameters and a body (in uScheme, the body is a single expression, but in uSmalltalk, the body is a sequence of expressions). More on blocks later.
4	The last important difference is that uSmalltalk includes an expression for returning early from a method. Arguably, more of a convenience than a necessity, but helpful. More examples of `return` later.

Object Creation in uSmalltalk

Here is an example to demonstrate some of the subtleties of object creation in uSmalltalk.

(class Point
  [subclass-of Object]                                  (1)
  [ivars x y]  ;; instance variables
  (class-method new ()                                  (2)
    (self withX:withY: 0 0))
  (class-method withX:withY: (thatX thatY)              (2) (3) (7)
    (locals ans)
    (set ans (super new))                               (4)
    (ans setX: thatX)                                   (5)
    (ans setY: thatY)                                   (5)
    ans)                                                (6)
  (method setX: (thatX) (set x thatX) self)             (7)
  (method setY: (thatY) (set y thatY) self)             (7)
  (method x () x)
  (method y () y)
)
(class ColorPoint
  [subclass-of Point]                                   (1)
  [ivars c]  ;; instance variables
  (class-method new ()                                  (8)
    (self withX:withY:withC: 0 0 'black))
  (class-method withX:withY:withC: (thatX thatY thatC)  (8)
    (locals ans)
    (set ans (self withX:withY: thatX thatY))
    (ans setC: thatC)
    ans)
  (method setC: (thatC) (set c thatC) self)
  (method c () c)
)

(val p1 (Point new))                                    (2)
(val p2 (Point withX:withY: 10 20))                     (2)

(val cp1 (ColorPoint new))                              (8)
(val cp2 (ColorPoint withX:withY:withC: 10 20 'green))  (8)

1	First, note that class `Point` is a subclass of `Object` and class `ColorPoint` is a subclass of `Point`.
2	An instance of `Point` is created by invoking one of the class methods of the (global) `Point` object (which represents the class). In this example, there are two "constructors": `new` which creates a point at the origin (with default values for the `x` and `y` instance variables) and `withX:withY:` which creates a point with initial values for the `x` and `y` instance variables. Note how the `new` class method simply sends the `withX:withY:` message to `self` with default values; because `new` is a class method, in its body, the variable `self` represents the `Point` object (which represents the class).
3	Now, consider the `withX:withY:` class method. It’s formal arguments are `thatX` and `thatY`, representing the initial values for the `x` and `y` instance variables of the instance that it will construct. In order to construct the new instance, it needs to create the instance, initialize its instance variables, and then return the constructed object.
4	The first step is to create the instance (which will have uninitialized instance variables). In this case, we want to create a new instance of the `Point` class, so it seems that it might be appropriate to use `(new self)`. However, this would lead to an infinite loop --- the `new` class-method of the `Point` class calls the `withX:withY:` class-method of the `Point` class, which would call the `new` class-method of the `Point` class. We need to "skip over" the `new` method of the `Point` class and use the `new` method of it’s super class, so it seems that it might be appropriate to use `(new Object)`. However, this would lead to errors later --- an instance of `Object` will not have `x` or `y` instance variables and won’t understand `setX:` and `setY:` messages. What we need to do is to "skip over" the `new` method of `Point`, but ensure that the created instance is an instance of `Point`. This is the role of the `super` variable/keyword; `(new super)` invokes the `new` method of the `Object` class, which in turn invokes the `new` method of the `Class` class (the class of all class objects), but remembers that the receiver is actually the `Point` class, so that the appropriate instance variables are created.
5	Next, the instance variables of the newly created object must be initialized. Remember, only the object itself can access it’s instance variables; therefore we send messages to the newly created object to set its instance variables. It would be incorrect to use `(set x thatX) (set y thatY)`, because the instance variables `x` and `y` are not visible within a class method.
6	Finally, the created and initialized object is returned as the result of the `withX:withY:` class method of the `Point` class.
7	Note Prof. Fluet’s naming convention for formal parameters of "constructor"-like methods. In Java, a common idiom when writing a constructor is to name the formal parameters of the constructor the same as the instance variables of the object; within the body of the constructor the ambiguity of whether a name is a formal parameter or an instance variable is resolved by using the `this` keyword; that is, within the body of the constructor for a Java version of the `Point` class, we would write `this.x = x;`. But, there is no `this` in uSmalltalk. In order to avoid shadowing the instance variables, the formal parameters of a method in uSmalltalk should use different names than the instance variables of the object; Prof. Fluet uses formal parameters named `thatX` because it is an alternate solution to the same problem that the `this` keyword solves in Java. Also, note that the methods like `setX:` and `setY:`, which update the state of the object, typically return the object itself as a result. This allows for "chaining" of operations: `p1 setY: 10) setX: 20)` (although it looks more natural in Java or Ruby (or "true" Smalltalk) syntax: `p1.setY(10).setX(20)` (or `p1 setY: 10 setX: 20`.
8	An instance of `ColorPoint` is created by invoking one of the class methods of the (global) `ColorPoint` object (which represents the class). Again, the `new` class method creates a color point with default values for the instance variables and the `withX:withY:withC:` class method creates a color point with argument values for the instance variables. Note how `withX:withY:withC:` uses it’s super-class’s `withX:withY:` "constructor" to create the object.

Exercises:

What would happen if we used (super withX:withY: 0 0) in the new class-method of the Point class?
What would happen if we used (Point withX:withY: 0 0) in the new class-method of the Point class?
What would happen if we used (super withX:withY: thatX thatY) in the withX:withY:withC: class-method of the ColorPoint class?
What would happen if we used (ColorPoint withX:withY: thatX thatY) in the withX:withY:withC: class-method of the ColorPoint class?
What would happen if we evaluated (val cp1 (ColorPoint withX:withY: 10 20))?

Acknowledgments

Portions of these notes based upon material by Norman Ramsey and Hossein Hojjat.