Bibliography collection


This collection, which contains 135 BibTeX entries, is a collection of working papers published from 1997 to 2006 at the Department of Computer Science, University of Waikato.

How the collection works

The collection configuration file (the collection's etc/collectionConfig.xml) begins with the specification groupsize 200. This groups 200 documents together into a single archive file. Bibliography collections typically have many small documents, and grouping them together prevents Greenstone's internal file structures from becoming bloated and occupying more disk space than necessary.

Apart from the standard plugins, this collection uses BibTexPlugin, which processes references in the BibTeX format (well known to computer scientists). Two options have been set for BibTexPlugin: -OIDtype assigned -OIDmetadata Number. This means the metadata element "Number" will be used as the record identifier, instead of Greenstone's default hash identifiers. These options are available for all plugins.

Fielded searching, with a form-based interface, is selected by format SearchTypes "form,plain" in the configuration file. In fact, a plain textual full-text search index is included in this collection as well (since form comes first, it is the default interface; you reach the plain search through the Preferences page).

The buildtype option shows that the default search engine mgpp is used. The indexes line specifies indexes for "text", and "metadata". In this case, "text" will be the original BibTeX record. "metadata" is a special keyword signifying that an index should be built for any metadata item found in the collection. Thus when the "field" menus in the collection's search page are pulled down, they show full records followed by an entry for each metadata element. In the collection's resources/collectionConfig.properties file, collection-level metadata collectionmeta can be specified for any index to determine what it is called (except for metadata, which produces many menu items). In this case, the collectionConfig.properties file specifies that the text index (referred to by collection's configuration file, collectionConfig.xml) should be named "full records" because it contains the original bibliographic record.

An additional keyword, "allfields", could also be used in the indexes line, specifying that combined searching over all indexes should be available.

The levels lines specifies only document level, as bibliographic records don't have internal structure.

This collection contains Title, Author, and Date browsers. The AZCompactList classifier used for the Author browser is like AZList but generates a bookshelf for duplicate items. The BibTeX plugin records each author as Author metadata; it also puts a list containing all authors into the Creator metadata element. Consequently the AZCompactList classifier is based on Author. However, Greenstone has a standard button reading authors whose name is (confusingly) "Creator", so this button name is specified for the classifier.

The format statements for the search results list and the title browser are both determined by the VList specification. It gives a document icon that links to the document itself (which in this collection is the full reference); the title in bold; Creator metadata if there is any, otherwise Editor metadata; and Month, Year metadata if there is any. Here is an example.

The format statement for the author browser (CL2VList) is more complex. The AZCompactList classifier generates a tree whose nodes are either leaf nodes, representing documents, or internal nodes. A metadata item called numleafdocs gives the total number of documents below an internal node. This format statement checks whether numleafdocs exists. If so the node must be an internal node, in which case the node is labeled by its Title. But beware: this classifier is generated on Author metadata, so its title -- the title of the classifier -- is actually the author's name! This means that the bookshelf nodes here are labeled by author's name. The leaf nodes, however, are labeled the same way as documents (i.e. references) are in the search results list.

The documents themselves (here is an example) are generated by two format statements, one (a long one) called DocumentHeading, and another called DocumentContent. The DocumentHeading, which is the top two-thirds of the page, contains the document's Title followed by a table that gives all the metadata elements that the BibTeX plugin can generate. The role of all the gsf:switch statements in the collection cofiguration file, collectionConfig.xml, is to determine which elements are defined.

The DocumentContent has been overridden. When the document is displayed initially, only a hyperlink reading Show/Hide BibTex Record appears -- clicking this invokes JavaScript to toggle the display of the raw BibTex record (showing the BibText version of the reference), which is hidden by default.