SML/NJ comes with a collection of utility libraries that are separate from the compiler. These provide modules for the following main areas.
Data structures such as hash tables, dynamically resizing arrays, bit vectors, queues, maps and trees and property lists.
Algorithms such as list sorting, searching, numeric formatting, random numbers.
Utility modules for simple parsing, and miscellaneous I/O operations.
Regular expressions.
Parsing HTML 3.2 markup.
Some Unix and socket utilities.
You can find these in the smlnj-lib source directory. At the moment these modules are underdocumented. There is some documentation in the Doc directory but it is mainly reference material which describes the API. For full details you will need to look through the source files.
In the following sections I will describe the modules briefly and give examples for those that I think will be the most useful.
The major data structure (outside of lists) is the tree. The utility library provides an implementation of red-black trees with functional behaviour. This means that an update of a tree produces a new tree leaving the original unchanged.
You might think that producing a new updated tree would be expensive as it seems to require copying the whole tree. But that copying is mostly by reference. Figure 5-1 shows what you get after updating a functional binary tree. In this diagram the node D has been replaced with a new node D'. This requires that the parent node C be updated with the new reference to D'. The updates propagate up the tree to the root at A' each producing a copy of the updated node. But no more nodes will be copied than the height of the tree. All other nodes will be shared. If the application drops the reference to node A then the garbage collector will reclaim the old nodes A, C and D. In the mean-time the application has access to both the old and new versions of the tree which can be useful (e.g. for undoing the update).
As long as the tree stays reasonably balanced then the height of the tree stays minimal and lookups and updates are cheapest. The red-black algorithm adjusts the tree after each update to keep it close to balanced. The algorithm is fairly complex. A generic implementation is provided in the RedBlackMapFn functor in the redblack-map-fn.sml source file.
functor RedBlackMapFn (K : ORD_KEY) :> ORD_MAP where Key = K = ... signature ORD_KEY = sig type ord_key val compare : ord_key * ord_key -> order end ... datatype order = LESS | EQUAL | GREATER |
This functor can be specialised for a key of any ordered type. The ORD_KEY signature describes the key type. The order type is predefined in the SML Basis. The resulting structure satisfies the ORD_MAP signature defined in the ord-map-sig.sml file. This signature describes the tree operations such as insert, remove etc.
The library then goes on to use red-black trees to implement sets. A set is just a map from a domain to a boolean value (which is always true for members of the set). But for efficiency a separate implementation of red-black sets is provided in the RedBlackSetFn functor.
Next the library provides some specialisations of these red-black sets and maps for various keys. For keys of int and word the code is re-implemented, presumably for efficiency. For other key types such as atoms the functor is specialised.
structure AtomRedBlackMap = RedBlackMapFn ( struct type ord_key = Atom.atom val compare = Atom.compare end) |
An atom is a string paired with a hash value so that equality can be tested very quickly by first comparing the hash values. The library's implementation of atoms also ensures that all equal strings share the same atom by reference. They are useful in the symbol tables of compilers and any similar application.
Other kinds of maps are also implemented. The BinaryMapFn functor in the binary-map-fn.sml source file also keeps the tree reasonably balanced. The implementation says "The main advantage of these trees is that they keep the size of the tree in the node, giving a constant time size operation." Matching implementations for sets and integer and atom keys are provided. You can also try out the SplayMapFn functor which does it with splay trees and the ListMapFn functor which does it with sorted lists. These all conform to the ORD_MAP and ORD_SET signatures so they are interchangable.
To demonstrate functional maps here is a test program that reads pairs of numbers from a file and makes a map of them. The map is then printed out.
structure Main= struct structure Map = IntRedBlackMap fun toErr s = TextIO.output(TextIO.stdErr, s) fun read_file file : int Map.map = let val istrm = TextIO.openIn file (* Read a pair of ints on a line and loop. Empty lines are ignored. Other junk is fatal. *) fun get_pairs map_in lnum = ( case TextIO.inputLine istrm of "" => (TextIO.closeIn istrm; map_in) (* eof *) | line => let val tokens = String.tokens Char.isSpace line in case map Int.fromString tokens of [] => get_pairs map_in (lnum+1) | [SOME a, SOME b] => get_pairs (Map.insert(map_in, a, b)) (lnum+1) | _ => raise Fail (concat["Invalid data on line ", Int.toString lnum]) end ) handle x => (TextIO.closeIn istrm; raise x) in get_pairs Map.empty 1 end and show_pairs pairs = let fun show (a, b) = print(concat[ Int.toString a, " => ", Int.toString b, "\n"]) in Map.appi show pairs end fun main(arg0, argv) = ( case argv of [file] => ( show_pairs(read_file file); OS.Process.success ) | _ => ( print "Usage: intmap file\n"; OS.Process.failure ) ) handle x => ( toErr(exnMessage x); toErr("\n"); OS.Process.failure ) val _ = SMLofNJ.exportFn("intmap", main) end |
A very different kind of map is the hash table. The implementation is imperative meaning that the table is updated in place. See the section called Using a Hash Table in Chapter 2 for an example of using a hash table to map from strings to strings.
The hash function that is used is defined in the HashString structure in the hash-string.sml file. It implements the recursion (h = 33 * h + 720 + c) over the characters of the string. The same hash function is used for atoms.
The utility library provides three useful kinds of arrays in addition to the array types defined in the Basis library (see the section called Arrays and Vectors in Chapter 3).
There is a monomorphic dynamic array functor which automatically grows in size to accomodate the data.
There is a BitArray structure that stores bits compactly in bytes. It provides all of the standard array operations. In addition there are bit-operations like and, or and shift over arrays. See the bit-array.sml source file for more details. There is a matching BitVector structure for immutable bit vectors.
There is an Array2 structure for polymorphic two-dimensional arrays. See the array2.sml source file for more details.
The dynamic array grows by at least doubling in size as needed to accomodate all of the elements that have been referenced. This doubling requires copying the original array. The elements are stored internally in a reference to an array so the growth is imperative. Newly created elements are initialised with a default value. The module is a functor which constructs the dynamic array from a monomorphic array.
functor DynamicArrayFn (A : MONO_ARRAY) : MONO_DYNAMIC_ARRAY signature MONO_DYNAMIC_ARRAY = sig type elem type array val array: (int * elem) -> array val subArray: array * int * int -> array val fromList: elem list * elem -> array val tabulate: int * (int -> elem) * elem -> array val default: array -> elem val sub: array * int -> elem val update: array * int * elem -> unit val bound: array -> int val truncate: array * int -> unit end |
The signature MONO_ARRAY is predefined in the Basis library. It characterises any type that can be made to behave like an imperative array (see the section called Arrays and Vectors in Chapter 3). The MONO_DYNAMIC_ARRAY signature provides a restricted set of operations on dynamic arrays which currently omits the iterator operations. See the source file mono-dynamic-array-sig.sml for more details on the operations.
Here is an example of creating a dynamic array of 1000 real numbers initialised to zero (the default value).
structure DynRealArray = DynamicArrayFn(Real64Array) val reals = DynRealArray.array(1000, 0.0) |
There is a MonoArrayFn functor which is a little utility for creating array structures for DynamicArrayFn. For example, since there is no predefined IntArray structure you could write
structure DynIntArray = DynamicArrayFn(MonoArrayFn(type elem = int)) |
The utility library implements first-in-first-out queues in both the functional and imperative styles. The functional ones are called fifos and the imperative ones are called queues (for no special reason).
The fifo implementation is in the Fifo structure in the fifo.sml source file. The queue implementation is in the Queue structure in the queue.sml source file. It's actually just a wrapper for a fifo stored in a reference.
The implementation of fifos is a little tricky for a typical functional language like SML. It requires access to both ends of a list. But SML only provides cheap access to the front of a list. If you naively enqueued a value by appending it to the end of a list that would require copying the entire list which would be ridiculously expensive.
The solution is to split the list into front and rear halves with the rear half in reverse. This moves the end of the fifo to the front of a list, as shown in Figure 5-2.
Elements can be dequeued from the front of the fifo by just removing them from the front of the 'front' list. Elements can be enqueued to the rear of the fifo by adding them to the front of the 'rear' list. When the front list becomes empty then the elements in the rear list are transferred to it. This still requires copying a list but much less often than the naive implementation above. A detailed analysis of the performance of this solution can be found in [Okasaki]. It turns out that the average time to enqueue or dequeue an element is O(1), that is, effectively constant.
A property list (plist) is a list of key-value pairs that can be attached to or associated with some value. The Lisp ([Steele]) language has had plists since its beginning in 1959. In Common Lisp they are only used for annotating symbols. The SML design allows you to annotate any base value such as a node in a tree. You can add or remove properties from the list so they are naturally imperative.
The utility library has an implementation of property lists in the PropList structure of the plist.sml source file. A key can appear only once in a property list but it can appear in any number of property lists. In Lisp the keys would typically be Lisp symbols but in SML any type that supports equality will do. The design provides a more abstract interface to a property. Each property is represented by a group of functions that access the property. The actual implementation of the key is an internal detail. Here is the signature of the PropList structure.
signature PROP_LIST = sig type holder val newHolder : unit -> holder val clearHolder : holder -> unit val sameHolder : (holder * holder) -> bool (* returns true, if two holders are the same *) (* newProp (selHolder, init) *) val newProp : (('a -> holder) * ('a -> 'b)) -> { peekFn : 'a -> 'b option, getFn : 'a -> 'b, setFn : ('a * 'b) -> unit, clrFn : 'a -> unit } val newFlag : ('a -> holder) -> { getFn : 'a -> bool, setFn : ('a * bool) -> unit } end |
A holder, unsurprisingly, holds one property list. To associate a property list with a value you must be able to write a function that maps from the value to a holder. This function is called selHolder in the signature. How you write this is up to you. For example if you were attaching property lists to nodes in a tree you would simply include a holder as one of the fields of a node. The selHolder function would then just select the field from a node. The 'a type variable represents the type of the annotated value and 'b is the type of the property's value.
The newHolder function creates a new empty holder. The clearHolder function deletes all of the properties in a holder.
The newProp function defines a new property. The property is represented by the four access functions: peek, get, set and clear. These operate in terms of the annotated value so you have to supply the selHolder function to translate the annotated value to the holder. The property is restricted to appear only in plists that can be selected by the selHolder function. This usually means it is restricted to the plists of one annotated type.
The init function is used to create an initial value for the property should you try to get the property value before it is set. This initial value is allowed to depend on the annotated value for more flexibility.
The newFlag function makes a specialised kind of property that has no value. You only test if it is present or absent in the plist. The get function returns true if it is present.
Here is a simple demonstration of some properties. Where this demonstration differs from other kinds of data structures is that the set of properties is completely dynamic. You can invent new properties on the fly rather than just having the fixed number of fields in a record. Accessing these properties will be a bit slow though. First I define a set of people and some properties for them.
structure PL = PropList (* Associate a plist holder with each person. *) val people: PL.holder STRT.hash_table = STRT.mkTable(101, NotFound) (* Add someone to the table. *) fun define name = STRT.insert people (name, PL.newHolder()) (* Define some properties. Weight is a real measure in kilograms. Father is a string. *) val weight_prop = PL.newProp (STRT.lookup people, fn _ => 0.0) val father_prop = PL.newProp (STRT.lookup people, fn _ => "unknown") (* Functions to set and get the properties. *) fun set prop (name, value) = let val {peekFn, getFn, setFn, clrFn} = prop in setFn(name, value) end fun get prop name = let val {peekFn, getFn, setFn, clrFn} = prop in getFn name end |
The people are represented by a hash table that maps a name to a plist holder. (See the section called Using a Hash Table in Chapter 2 for details of my STRT structure). The set and get functions are polymorphic enough to work for all properties. Unfortunately the type of a property is not named in the PROP_LIST signature so I end up having to write out all of the fields in the property record each time.
Here is a demonstration of the use of these functions. I define some people, set some properties and then dump them all.
val names = ["fred", "wilma", "barney", "betty", "wilma", "pebbles", "bambam"] fun show_father name = print(concat[ name, "\thas father ", get father_prop name, "\n"]) fun show_weight name = print(concat[ name, "\thas weight ", Real.toString(get weight_prop name), "kg\n"]) fun run() = let in app define names; app (set father_prop) [("pebbles", "fred"), ("bambam", "barney") ]; app (set weight_prop) [("fred", 111.0), ("wilma", 73.0), ("barney", 82.5), ("betty", 68.5), ("pebbles", 15.1), ("bambam", 18.7) ]; app show_father names; app show_weight names; () end |
What is especially interesting about SML plists is how they are implemented. A list in SML must always have elements of the same type. But property lists manage to have elements of different types and new types of elements can be added at any time. How do they do it?
If you were to implement something like this in an object-oriented language such as C++ or Java you would define a base class for a plist element. Then you would define a subclass for each type of element you want in the list. You can use run-time type identification ("downcasting") to extract a property's value. A subclass defines a subtype of the base class and you can use any subtype anywhere that its base type is used. This is how you do polymorphism in object-oriented languages.
But SML does not provide subtyping. Normally you have to define a datatype that represents the union of all of the subtypes you are going to use. This doesn't have the flexibility or extensibility of the object-oriented paradigm.
But there is a way around this. There is a dirty little trick you can do with exceptions in SML that provides a great big loophole in the type checking. An exception declaration declares a constructor that coerces some type to the special type exn. For example this declaration
exception Error of string |
declares the constructor function
val Error: string -> exn |
You can think of the exn type as being like a datatype where each exception declaration adds a new branch, no matter where the declaration appears in your program. The exn type is an extensible type. From another point of view, the Error declaration above lets you use a string type anywhere an exception type is allowed so it effectively declares the string type as a subtype of exn.
Exceptions have one more quirk up their sleeve. An exception declaration is, to use the jargon, generative. This means that each declaration creates a new exception each time that it is executed. For example if you have an exception declaration inside a function then each time that function is called a new distinct exception is created, even though the constructor name is the same. This is what lets you define new subtypes dynamically.
Here is an example where the properties are defined statically using exceptions.
type PList = exn list exception PFruit of string exception PNum of int val fruit = [PFruit "apple", PNum 5] fun get_fruit [] = NONE | get_fruit (PFruit f::_) = SOME f | get_fruit (_::rest) = get_fruit rest |
The list fruit contains some properties. The get_fruit will find and return the value of the PFruit property. This all works the same way as if I had written
datatype PList = PFruit of string | PNum of int |
This next example creates properties dynamically.
fun new_prop dflt = let exception E of 'a fun get [] = NONE | get (E v::_) = SOME v | get (_::rest) = get rest fun set props v = (E v)::props fun dummy() = E dflt in (get, set) end val (get_colour, set_colour) = new_prop "colour" val props2 = set_colour fruit "red" val (get_weight, set_weight) = new_prop 0.0 val props3 = set_weight props2 0.75 fun report() = ( print(concat[ "The colour is ", valOf(get_colour props3), " and the weight is ", Real.toString(valOf(get_weight props3)), "\n"]) ) |
Every time the new_prop function is called the exception declaration will be executed. This will define a new exception which is known locally by the name E. This new exception is captured by the get and set functions which provide the only access to it. I've had to include an extra dummy argument to let me constrain the type of the value for each property. Without it, the type 'a is completely unconstrained and the type checking fails with the notorious value restriction message:
dyn.sml:36.5-36.34 Warning: type vars not generalized because of value restriction are instantiated to dummy types (X1,X2,...) dyn.sml:37.5-37.34 Error: operator and operand don't agree [tycon mismatch] operator domain: ?.X1 operand: string in expression: (set2 fruit) "red" |
By including the dflt argument and the dummy function I constrain the type of the property value to be the same as that of the dflt argument. So when I write (new_prop "colour") the get_colour and set_colour functions are known to work on strings. In a more complete example the dflt argument would be useful as a default or initial value for the property.
With this exception trick and property lists you can build up a nice little prototype-based object system along the lines of the Common Lisp Object System (CLOS)[Steele].