GREENSTONE DIGITAL LIBRARY DEVELOPER'S GUIDE
Appendix A: The C++ Standard Template Library
The Standard Template Library (STL) is a widely-used C++ library from Silicon Graphics (www.sgi.com). This Appendix gives a brief overview of key parts that are used throughout the Greenstone code. For a fuller description, consult the official STL reference manual, available online at www.sgi.com, or one of the many STL textbooks, for example Josuttis (1999).
As the word “template” suggests, STL is not just a plug-and-use object library. Coupled with the template mechanism in C++, it provides a forum for programmers to concisely develop their own objects that tap into the algorithmic capabilities embedded within STL. This adds an extra layer of complexity, but it's worth it.
To help understand the Greenstone code excerpts given in this manual, we give a few tutorial level examples that use STL.
Figure 48 Programming a list of integers
First we study two programs that implement an integer list. One uses basic C++ types (the “old fashioned” way), the other uses STL. Figure 48 shows the source code implementation that does not use STL. Lines 5—8 define the basic data structure we are going to use: the field val stores the integer value, and next points to the next element in the list—a classic implementation of a linked list.
To demonstrate use of the data structure, the main program (lines 23—32) sets up an integer list with elements [5, 4]. It then calls the function total_int_list (defined over lines 10—21) which takes as its only parameter a pointer to the head of a list and sums the values in it. The returned answer (9 in our example) is printed to the screen.
The main work is accomplished by lines 12—18. First some initialisation: the local variable total is set to zero, and curr to point to the start of the list. Then a while loop adds the current integer element in the list to the running total (total += curr->val;) before moving on to the next element (curr = curr->next;). The while loop terminates when curr becomes equal to nil, signifying that there are no more elements left to process.
Figure 49 Programming a list of integers using STL
Figure 49 shows an equivalent program using STL. It is no longer necessary to define a suitable data structure in the code; all that is necessary is the #include <list> directive on line 2 that includes the template version for a list defined in STL. The object is called a “container class” because when we declare a variable of this type we also specify the type we want it to store. On line 19 an integer list is realised with the statement list<int> vals;. Elements can be added to the object using the member function push_back(), as is done on lines 20—21.
The main work is done by lines 6—12. There are still two initialisations and a while loop, but other than that the new syntax has little in common with the old. Central to this new way of processing is a variable of type iterator(line 7). In STL many classes include iterator types to provide a uniform way of working through a sequence of container objects. The first element is returned with begin() and the element just past the last one with end(). Moving to the next element is accomplished by the increment operation ++, which is overloaded by the iterator class to implement this task, and the value stored there is accessed through dereferencing (*curr on line 10), which is also overloaded.
The STL implementation of this program is slightly smaller (25 lines verses 31) than the conventional code. The gains are more noticeable in larger projects, because the STL list object is more powerful than the example here illustrates. It is, for instance, a doubly linked list that supports several forms of insertion and deletion—something that would require additional programming effort to add to the basic integer list version.
Note that the parameter to total_int_list in Figure 49 was implemented as a pointer, to correspond with the pointer used in total_int_list in Figure 48. In STL it is often more natural (and more desirable) to use references rather than pointers. Then the parameter becomes list<int>& head, and its member functions are called with the syntax head.begin(); and so on.
When implementing a digital library system, it is useful to be able to store elements in an array indexed by text strings rather than by numeric indexes. In Greenstone, for example, this greatly simplifies storing the macro files once they have been read; and the various configuration files. A data type that supports such access is called an associative array, and is often built in to modern high-level languages. It is also known by the name hash array (most notably in Perl), since hashing is the normal technique used to implement the text index.
Figure 50 Using associative arrays in STL
In STL, associative arrays are accomplished using the map object. Figure 50 shows a somewhat contrived example that stores the age of three people (Alice, Peter and Mary) in an associative array under their own names (lines 19—22). The problem is to write a function that calculates the total age of the people present, without knowing how many there are or what their names are. Of course, this could be solved with a standard numeric array of integers. The example is contrived to demonstrate the features of the map object and bring out the similarity of processing with the list object with an iterator.
Like list, map is a container class. However, when declaring a variable of this type we must specify two things: the index type, and the element type. As can be seen on line 19, we obtain an associative array that stores integers using char*(which is how a string is declared in C++) as the index type followed by int as the element type.
There are several ways to store elements in the associative array. In the example on lines 20—22 the overloaded array subscript [ ] is used to initialise the table with the ages of the three people. The similarity of total_int_table—which performs the main calculation in the program—to total_int_list in Figure 48 is striking. In fact, they are nearly identical, and this is no coincidence. STL makes heavy use of inheritance so that different objects still use the same fundamental operations. This is particularly true with iterators. The small differences between the two functions are that the iterator is now derived from map<char*, int>, and access to its elements is with curr->second()—because dereferencing the variable (*curr) is defined to return an object of type pair. This records both the index name (first) and the element value (second), but we only want the latter. Other than that, the code remains the same. The only remaining difference—changing the function's only argument from a pointer to a reference—is superficial.
Two other STL types widely used in the Greenstone code are vector and set. The former facilitates dynamic arrays, and the latter supports mathematical set operations such as union, intersection and difference.
 Technically there are four types, but the last two are optional. Since we are only giving a basic introduction to this STL class, details about these last two types are omitted.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License.”