MARC example

This collection, which contains 234 MARC entries, is based on the MARC records of working papers published by Computer Science Department at the University of Waikato. Here is a sample document in the collection.

How the collection works

The configuration file, collectionConfig.xml, uses MARCPlugin to process the MARC records, as well as the standard plugins. There are three classifiers, based on dc.Title, dc.Creator, and dc.Subject metadata. The Title classifier uses AZList, while the other two use AZCompactList, which groups items with the same metadata into a bookshelf. The -removesuffix argument for the Title and Creator classifiers removes suffixes from the metadata string (dc.Title and dc.Creator respectively). This is specified as a PERL regular expression, and trims characters (such as trailing punctuation) from the strings for display.

The VList format statement controls the display of search results and all classifiers. For bookshelves, the number of leaf documents is displayed on the right-hand side. For documents, dc.Title is displayed, along with dc.Creator and dc.Publisher. [sibling:dc.Creator] is used as dc.Creator has multiple values, and specifies that all values be output, not just the first one.

The MARC plugin uses a special file to map MARC field numbers to Greenstone-style metadata. This file resides in the greenstone3 installation folder's gs2build/etc directory, and is called marc2dc.txt. It lists the correspondences between MARC field numbers and Greenstone metadata. Any MARC fields that are not listed simply do not appear as metadata, though they are still present in the Greenstone document. Each line in the file has the format

<MARC field number> -> GreenstoneMetadataName
Lines in the file that begin with "#" are comments.

The standard version of this file is loosely based on the MARC to Dublin Core mapping found at http://www.loc.gov/marc/marc2dc.html (which assumes USMARC/MARC21).

Multiple MARC fields may map to a single Dublin Core field. For example, fields 720 ("Uncontrolled name"), 100 ("Personal name"), 110 ("Corporate name") and 111 ("Meeting name") all map to dc.Creator. Actual MARC records normally define only one of these fields, and anyway Greenstone allows multi-valued metadata.

Some mappings are dependent on subfields. For example, MARC field 260 contains information about publication and distribution. Subfields "c" (Date of Publication) and "g" (Date of manufacture) are mapped to dc.Date, using the following mapping line:

260$c$g -> dc.Date
Greenstone also provides a file for mapping MARC to qualified dublin core: in your Greenstone 3 installation folder's gs2build/etc/marc2qdc.txt. This can be used by the MARC plugin by setting the -metadata_mapping_file option to "marc2qdc.txt".