Archive for March, 2008

Waikato visit report from John Rose

John Rose. Wednesday, March 26th, 2008.

I have been a volunteer research associate in the Greenstone team for more than two years, and was very pleased to be able to visit the University of Waikato, at the invitation of Prof. Ian Witten, from 5 to 19 March 2008 (this was also my first visit to New Zealand).

I live in France and have been working, mainly through the internet, to promote the use of Greenstone in developing countries. As a corollary activity, I have also been collaborating with Anna Huang to improve and test the Greenstone language interfaces with emphasis on those needed in developing countries. I had met Ian several times in Paris, and also David Bainbridge, but this visit was my first opportunity to meet the other members of the team.

During my visit I was able to experiment with Greenstone functions which were new to me, discuss problems encountered and future improvements, and consider with the team our strategies for more effectively reaching and involving users in developing countries.

Here are some of the highlights of what was learned and discussed:

Possible problems with Windows XP Home edition

I had followed the instructions for setting up an Apache web server (file library.txt in the Greenstone home directory) under Greenstone 2.80, and found that access to existing collections from the same computer was only possible when the collect sub-directory was shared with all network users (a contradiction since only one user was concerned for client and server).

Similarly, I followed the instructions for installation of the GLI Client and could neither create new collections nor access existing collections.

These two problems were consistent and replicable on my computer for several days, but without explanation they both stopped. I personally feel that there is some interference with the file sharing system under Windows XP Home edition, which mysteriously ended with the many manipulations that were done to understand the problems (there seem to be some internal system user names which may have been involved). Kathy Don is experimenting with Greenstone on this version of Windows. Users who are having similar problems are invited to report them on the Greenstone users list.

The reason for the problem that I was having with the GLI applet was found: the directory where Java SDK was installed was not in the PATH environment variable, which prevented the keytool/jarsigner sequence from functioning. When it was added to PATH, the applet worked fine. I added a warning to this effect in the GLI applet installation instructions.

OAI-PMH

Open Archives Initiative - Protocol for Metadata Handling is a powerful method for open access sharing metadata on the web (see tutorial).

I tested the OAI server under Greenstone 2.80 and it works fine (this is documented only very briefly in the OAI Demo documented example collection, but it’s operation is simple: one needs to have the Web Library - not the Local Library - running and to have previously edited the etc/oai.cfg file according to the instructions found in it.. When this option is active, one or more specified collections serve OAI data to OAI harvesters while the normal web access to these collections continues normally.

I also tested the OAI downloading function as presented in a tutorial on the wiki. This function, potentially very useful for collecting external documents for local Greenstone collections, makes use of the fact that, although OAI-PMH is formally designed only to share metadata, this metadata normally provides information on the location of the original document in the dc.identifier metadata field. But two major constraints were identified:

  • The provision of simple url in this field (as done in the “Rocky” collection at Virginia Tech used in the OAI Demo documented example collection) is not widespread; most OAI repositories provide a handle reference (DSpace) or the url of a webpage containing a link to the original document (EPrints).
  • In the Greenstone version 2.80, the metadata imported under OAI-PMH cannot be edited, justifiable in the sense that they were assigned by the original creator, but inconvenient if documents are to be integrated into a new special collection.

While I was at Waikato, David Brainbridge improved the OAI download facilities to recover the original documents in a all of the above cases, and to convert the metadata to editable form if desired. These improvements will be included in version 2.81 of Greenstone.

Depositor

This undocumented function enables a remote user of a Greenstone web library to submit documents to a collection, and to assign metadata to them, through the web without installing Greenstone or GLI. One need only enable the depositor (by changing “disabled” to “enabled” in the main.cfg file in the etc directory); the Depositor can then be called from a button on the Greenstone home page.

This function should be very useful in creating institutional repositories with Greenstone. It will be documented in version 2.81 (careful: to test it now, you have to assign the user to the “colbuilder” group, even though this has now been replaced by “all-collections-editor” or “personal-collections-editor” for authentication in Greenstone.

Formatting Documents within GLI

If Greenstone users want to manage the formatting of documents in a collection, they are presently obliged to do it outside of GLI (either by reformatting the original document or by creating a formatted html document from the original). Anupama Krishnan has developed a prototype function enabling the user to convert the original document (e.g. in Word or pdf format) to html and subsequently edit it within GLI (for example to define section headings and sub-headings or to improve the style of presentation) before building the collection. This function, to be included in version 2.81, will enable users in many cases to reduce the size of their collections and/or improve the quality of presentation by eliminating the need to present both the original document for display and the html version for searching.

Greenstone3

I was able to install Greenstone3 without any difficulties. It currently performs most of the functions of Greenstone2. The main difference for the basic user is that the formatting language for displaying documents is different, and may appear, at least at first, more complicated than the formatting language of Greenstone2. Dave Nichols is preparing to develop a graphical user interface to facilitate the formatting process, but this will have to await the completion of the basic formatting interface. Given the substantial benefits of Greenstone3 for advanced programmers, and the substantial overhead in maintaining two versions, there is a consensus within the Greenstone team that Greenstone3 should be developed and stabilised as soon as possible to replace Greenstone2.

Updates and documentation

I was able to point out some shortcomings in the latest update (version 2.80):

  • Several of the language interfaces (including Malayalam, Tamil and Telugu) not activated upon installation (the user should add them to the main.cfg file if needed
  • Example collections not updated on Sourceforge (now fixed).

It was agreed that the checklist for issuing new versions should be tightened more closely controlled for future distributions.

In addition we discussed ways to:

Collaboration with users

The Greenstone team is consists overwhelmingly of faculty members who are doing research in the area of digital libraries. Some technical staff (one full time and several part-time, including Ph.D. students) are available to support the research effort, including as appropriate to help incorporate new research results into Greenstone, but resources to ensure support for the international Greenstone community are extremely modest. I participated, in some sense on behalf of the users, in discussions of the Greenstone team on how to improve user support and collaboration within the existing constraints.

The following ideas were expressed:

  • Users as well as developers should be encouraged to use the bug reporting system, which can be used to report interface presentation problems as well as technical problems.
  • The regional and linguistic user communities should be encouraged to participate more actively in helping users in their regions and beyond, while in turn the Greenstone team could work more closely follow and support organised user efforts, especially in the developing countries (already Kathy Don is providing technical support for the southern African network, Anuparma Krishnan for the South Asian network, and Anna Huang for the language interfaces, all with support from myself on the “soft” aspects.
  • The possibility of more closely involving institutions in developing countries in Greenstone research and development activities should be explored. For example, major research thrusts in digitisation of newspapers and in audio-visual collections could perhaps include the development and testing of relevant applications in developing countries.