OAI demo

This collection demonstrates Greenstone's ImportFrom feature. Using the Open Archive Protocol (version 1.1), it retrieves metadata from rocky.dlib.vt.edu/~jcdlpix, a collection of photographs taken at the inaugural Joint Conference on Digital Libraries. A Greenstone collection is built from the records exported from this OAI data provider. The implementation is flexible enough to cope with the minor syntax differences between OAI 1.1 and OAI 2.0.

How the collection works

The collection configuration file, collectionConfig.xml, includes an acquire line that is interpreted by a special program called importfrom.pl. Like other Greenstone programs, this takes as argument the name of the collection, and provides a summary of other arguments when invoked with argument -help. It reads the collection configuration file, finds the acquire line, and processes it. In this case, it is run with the command:

 importfrom.pl oai-e 
(the collection's name is oai-e).

The acquire line in the configuration file specifies the OAI protocol and gives the base URL of an OAI repository. The importfrom program downloads all the metadata in that repository into the collection's import directory. The getdoc argument instructs it to also download the collection's source documents, whose URLs are given in each document's Dublin Core Identifier field (this is a common convention). The metadata files, which each contain an XML record for one source document, are placed in the import file structure along with the documents themselves, and the document filename is the same as the filename in the URL. The Identifier field is overridden to give the local filename, and its original value is retained in a new field called OrigURL.

This oai-e collection's own etc/oai.txt is an example of a downloaded metadata file.

Once the OAI information has been imported, the collection is processed in the usual way. Besides the four standard plugins (GreenstoneXMLPlugin, MetadataXMLPlugin, ArchivesInfPlugin and DirectoryPlugin), the configuration file specifies the OAI plugin, which processes OAI metadata, and the image plugin, because in this case the collection's source documents are image files. The OAI plugin has been supplied with an input_encoding argument because data in this archive contains extended characters. It also has a default_language argument. Greenstone normally determines the language of documents automatically, but these metadata records are too small for this to be done reliably: hence English is specified explicitly in the language argument. The OAI plugin parses the metadata and passes it to the appropriate source document file, which is then processed by an appropriate plugin -- in this case ImagePlugin. This plugin specifies the resolution for the screen versions of the images.

Extracted metadata from OAI records are mapped to Dublin Core Metadata Set by default. As a result, classifiers and indexes in this collection are built with Dublin meatadata elements.

The collection configuration file, collectionConfig.xml, specifies a single full-text index containing dc.Description metadata and overrides Greenstone's custom gsf format templates DocumentHeading and DocumentContent (XSL). When a document is displayed, the DocumentHeading format statement puts out its dc.Subject. Then the DocumentContent statement follows this with screenicon, which is produced by ImagePlugin and gives a screen-resolution version of the image; it can be hyperlinked to the dc.OrigURL metadata -- that is, the original version of the image on the remote OAI site. Since this is no longer available on the web, it is now hyperlinked to the full version of the image file. This is followed by the image's dc.Description, also with a hyperlink; the image's size and type, again generated as metadata by ImagePlugin; and then dc.Subject, dc.Publisher, and dc.Rights metadata. This is the result.

There are two browsing classifiers, one based on dc.Subject metadata and the other on dc.Description metadata (but with a button named "captions"). Recall that the AZCompactList classifier is like AZList but generates a bookshelf for duplicate items. In this collection there are a lot of images but only a few different values for dc.Subject metadata.

It's a little surprising that AZCompactList is used (instead of AZList) for the dc.Description index too, because dc.Description metadata is usually unique for each image. However, in this collection the same description has occasionally been given to several images, and some of the divisions in an AZList would contain a large number of images, slowing down transmission of that page. To avoid this, the compact version of the list is used with some arguments (mincompact, maxcompact, mingroup, minnesting) to control the display -- e.g. groups (represented by bookshelves) are not formed unless they have at least 5 (mingroup) items. To find out the meaning of the other arguments for this classifier, execute the command classinfo.pl AZCompactList. The programs classinfo.pl (for classifiers) and pluginfo.pl (for plugins) are useful tools for learning about the capabilities of Greenstone modules. Note incidentally the backslash in the configuration file, used to indicate a continuation of the previous line.

The VList format specification shows the image thumbnail, hyperlinked to the associated document, followed by dc.Description metadata; the result can be seen here. The Vlists for the classifiers use numleafdocs to switch between an icon representing several documents (which will appear as a bookshelf) and the thumbnail itself, if there is only one image.

The Greenstone OAI server

Greenstone comes with a built-in OAI data provider. This runs as a CGI program called "oaiserver.cgi", and is installed in the Greenstone cgi-bin directory. It can be accessed via the same URL as the Greenstone library (replacing "library.cgi" with "oaiserver.cgi"). If you are using the Windows local library server, you must install a web server (such as Apache) to run the OAI server.

Configuration of the server is done via the oai.cfg file in the Greenstone etc directory. This file specifies general information about the repository, and lists collections to be made accessible to OAI clients. By default, collections are not accessible. To enable a collection, add its name to the oaicollection list.

Greenstone's OAI server currently supports Dublin Core, qualified Dublin Core and rfc1807 metadata sets. The oaimetadata line specifies which sets should be used. For collections that use other metadata sets, metadata mapping rules should be provided to map the existing metadata to the sets in use. See the oai.cfg file for details.