Archive for the ‘Greenstone3’ Category


Anu’s entry for the week of 7-11 May 2012

ak19. Saturday, May 12th, 2012.

Over the week, have been working on the script (and things that it needs). The details are at

For the latest changes made today, need to retest these changes against GS3 on Windows.

Still need to test the entire process on Linux.

Anu’s entry for the weeks of 23 Apr - 4 May 2012

ak19. Friday, May 4th, 2012.
  • At the start of last week, finished off the task of the GS3 “debuginfo” button that now appears next to the login button.
  • The Greenstone tutorial xml files can now include a MajorVersion element with number attribute to specify if the instructions are for GS3 or GS2 and will get processed by the XSLT to display or hide such elements depending on the active version.
  • Joshua Scarsbrook discovered two bugs compiling GS3 on a Mac and has helped us fix these (but one of the fixes still needs to be tested on his machine). Unfortunately there were some issues with setting the Java preferences on my account on the Mac here. At present, GS3 can’t be compiled there because it requires Java 1.6.
  • After Dr Bainbridge fixed error handling and display of the PDFBox Extension, it became easier to debug a PDFBox Extension bug discovered by a member on the mailing list. She helped us to track it down and it turned out that the PDFBox extension did not try to first look for and use any JRE included in a GS2 binary when running the java -version test.
  • While trying to work out why searching 3 digit numbers crashed the server (when Diego wanted to try the ifl=1 parameter to the GS2 URL), I first found and tracked down a very troublesome bug that I had accidentally introduced into GS2. The documents in browse or search results would not display and their URLs looked strange (with the word handle in their path). It turned out that in January, I’d committed the -DDOCHANDLE option to CXXFLAGS in a win32.mak file that was meant for the experimental work Dr Bainbridge and Diego had been doing with REST URLs. I meant to commit only the RSS support code they had written. Dr Bainbridge then fixed the bug Diego had originally noticed to do with the ifl parameter.
  • Some translation work and looked at a few mailing list questions.
  • Currently started work on which should perform in perl the task that GLI currently does of stopping the GS2 or GS3 server while moving the building to index and restarting the server again.

Anu’s entry for Apr 11 - Apr 20

ak19. Friday, April 20th, 2012.
  • <gslib:langfrag> was to replace a whole lot of XSLT statements instantiating a javascript array in header.xsl. It took a bit of messing about, but util.xsl now calls a new Java method to generate the Javascript array declaration and initiation, and this called from document.xsl and documentbasket.xsl (instead of header.xsl) using the new <gslib:langfrag> element.
  • The gs2-library servlet was not working. After a lot of debugging, there were several fixes required. One was for the main servlet page, one for a collection page, one for viewing a browsing classifier. The major bug was in the way stylesheets were merged in This has now been improved.
  • During the above, the lack of clear error messages on where things were going wrong pointed to another problem: insufficient debugging information when XSLT transformations go wrong. XMLTransformer’s TransformErrorListener has now been improved (and TransformingReceptionist updated) to do better error reporting when transformations of in-memory DOM objects fail.
  • Revisited recent unsatisfactory changes to the ant force-stop-tomcat target, since an ant restart did not always properly stop and start tomcat. Now the process has been improved by making the stop-tomcat target do a socket test in a wait loop to ensure that tomcat is properly stopped before proceeding. Subsequent calls to ant start therefore can only happen after this, even if tomcat commands are chained such as happens during “ant restart” or “ant stop start”.
  • Running on Ubuntu wasn’t launching the browser. This was the old problem already seen in GS2 of the script setting up env vars for wvware which then conflicted with the native libraries used by Linux for running graphics applications. The problem was already solved for GS2 by the introduction of the script which would set up the environment for wvware, and all that was required to get things to work for GS3 was to stop setting the env for wvware in (and giving executable permissions to the wvware binary included in the Greenstone3 binary).
  • A new target ant clean-logs has been added, which deletes catalina.out and greenstone.log and can help speed up the debugging cycle.
  • Currently, am looking at a “debuginfo” button that will appear next to the login button and which display a page with the various options for o= that can be appended to the URL when debugging the XSL transformation process, as well as other URL suffixes.
  • Once that’s done, I will return to testing the Greenstone 3 binary and start with attempting to perform the Greenstone 2 tutorials with GS3.

Anu’s entry for 26 March - 5 Apr

ak19. Thursday, April 5th, 2012.

The very start of last week still required more work on the scripts that would handle translations made in Google’s Translator Toolkit. A couple of additional scripts were written.

Thereafter and until yesterday, the work has been mostly centred on GS3’s usersDB:

  • getting the output of txt2usersDB to work as input to usersDB2txt and viceversa as well as allowing txt2usersDB to run in append mode,
  • creating a new program to modify a user’s details in the DB which then gets called by the new targets config-user and config-admin (for setting the admin password) in build.xml
  • getting the releasekit to update the admin password where the user provides this

There were a few questions in the mailing list that required some investigating, and today I finally got round to looking at getting Java to write out a bit of javascript that was previously done in XSLT where it looked unsightly and verbose. Unfortunately, I couldn’t test it when I tried out the Document Structure Editor. I got a blank section and could not type into this the new gslib:langfrag element that I was meant to test.

Instead, I decided to write some handy instructions into the Wiki’s Greenstone 2 FAQ which will help explain how to do some common tasks. The questions added to the FAQ are on how to manually build collections, how to get better error reporting in GLI, how to run GLI in debug mode, how to launch, use and copy from the DOS prompt, how to launch Windows Explorer and where to find the Windows key.  From experience, these instructions will be particularly helpful when answering Mailing List questions, as Greenstone users can be referred to these new FAQ items, armed with which they will then be better equipped to help us in the debugging process.

Sam’s Greenstone Blog 5/4/2012

sjm84. Thursday, April 5th, 2012.

I’m writing this a day early this week because I will not be working tomorrow due to Easter, I still have a bit to report however.

This week I decided to re-think one of the previous decisions we made in regards to viewing documents in Greenstone 3. In Greenstone 2 you can normally only view one section at a time and therefore it is difficult to get a grasp of where you are in the document and also how big the document is. For this version of Greenstone 3 we decided that we would take the opposite approach and that, by default, viewing the document would always show the whole document instead of just a section. This approach works find for small- to medium-sized documents, but for larger documents the page begins to get too long. To resolve this problem I have decided to implement a mixture of the two approaches. Now, by default, when you view a page it will only show the section you asked for, but it will also give you a full table of contents and links on the page that allow you to view other parts of the document without needing to go to a new page. New sections are dynamically loaded when they are needed and there is also an option to expand all the sections if you do want to see the whole document at once.

Sam’s Greenstone Blog 30/3/2012

sjm84. Friday, March 30th, 2012.

Just a short entry today to say that this week has mostly consisted of working on minor features and fixes. All of the major new features are mostly done now and we are nearing the final testing stage before we can release Greenstone 3.

Anu’s blog entry for 5 March - 23 March

ak19. Friday, March 23rd, 2012.

The first two weeks involved:

  • generating some files for translation of the Greenstone interface (Mongolian, Bhutanese) and committing changes translators had submitted (Laotian)
  • fixing up the GS2 CORBA code, including bringing it up to speed with the rest of GS2’s runtime code, so that CORBA works again: it can now compile once more, and the corbaserver and corbarecptldd client program run well against each other when on the same machine. Running the server against the client in a remote situation does not yet work, but it did not work in the demo/hello-1 example of the now-updated MICO package either.
  • there was still a small error in the way the PDFBox extension tests for Java when Java is version 1.7 that made the extension not work with JDK1.7. The test for the presence of Java now has to run java -version rather than just java, since the return value in Java 7 is different from that in Java 6.
  • when testing the Powerpoint plugin, it was found that the OpenOffice extension needed to be corrected to make jodconverter use the same port as that which OpenOffice is run on. It was moreover discovered that users can’t already have the graphical user interface of OO running in the background, nor can they start this, during Greenstone’s processing of documents using the OO extension.

This week:

  • there was some issue with Greenstone 3’s tomcat server crashing on 64 bit Linux owing to a Java segmentation fault created by an error in the JNI code. Dr Bainbridge found out that the number of bytes to store pointers to data structures shared between Java and C++ code needed to be long rather than int, so MG’s and MGPP’s JNI code was updated. The error has not returned since, but debugging code has been left in for future debugging if required.
  • Dr Te Taka hoped to update the Maori translations for Greenstone’s interface using Google’s Translator Toolkit (GTT), and suggested that Greenstone’s translation process be expanded to allow this so that other translators too could benefit from the toolkit for translation if they wanted. He found out that the toolkit accepted an open-XML format called TMX, Translation Memory eXchange, and thus would need the strings that required translation to be converted into the TMX XML format (rather than into the usual spreadsheets versions of the .excel.xml format which we currently generate). Two new XSLT files have been written which Te Taka may kindly be testing for us: the first generates the TMX translation files that translators can load into Google’s Translator Toolkit. The second XSLT takes translated TMX files and converts them into an intermediary format that can be processed in the usual manner when submitting new and updated translations back into Greenstone.
  • currently looking at usersDB in GS3 having the correct values on startup.

Update: did not get much further with the GS3 usersDB as there was a lot more to be done with the translation files for GTT and their processing. The process became clearer thanks to Te Taka’s explanations and his testing at each stage. TMX files will only be needed the first time a translator migrates from GS’s usual translation procedure, which makes use of excel spreadsheet files, to Google’s toolkit. The TMX file will start them up with all the up-to-date translated strings that are available so far in GS3 for the selected language. For the strings that need to be translated and updated, the translator will get a text file that contains the unicode spreadsheet data (as comma separated values, but the file will have a .txt extension instead of .csv in order to preserve the unicode). The translator will then copy the English and <Language> columns of the spreadsheet into the GTT. Once their translation work is done, they can send these same columns back by way of the same spreadsheet.

Sam’s Greenstone Blog 23/3/2012

sjm84. Friday, March 23rd, 2012.

This week I have added the ability for users to register themselves for a Greenstone 3 site. To register a user must provide their username, password and email address (we may add more fields or the ability for an administrator to add custom fields) as well as match two words from an image (to make sure they’re not a bot). Users can now also modify their account settings themselves and next week we will probably look into adding the ability for Greenstone 3 to email the users (to confirm registration or to reset their passwords).

The paged-image widget I have discussed in the past also received several upgrades, such as the ability to filter pages based on their titles and it will now also show the page you are currently on within the widget. We are also planning on being able to specify number ranges to filter pages as well (e.g. typing “24-37″ will show all the pages from page 24 to page 37).

I have made a list of all the things we need to complete before we can release Greenstone 3, it’s gradually getting smaller so hopefully it won’t be much longer.

Sam’s Greenstone Blog 16/3/2012

sjm84. Friday, March 16th, 2012.

With the new user-login capability of Greenstone3 I have been creating and improving various features that relate to this capability. For example, the previous administration capabilities (the ability for admin users to add/edit/remove users) were not very secure and have now had an overhaul to properly connect with the servlet security method I described in a previous post. By itself however, trying to use the current administration capabilities to manage a large-scale Greenstone 3 installation with many collections and many users would be a difficult task as each user would need to be added/modified by an administrator. To aid in this problem we will create the ability for users to register themselves and to change their own basic settings (password and details), the more powerful options such as group assignment will still be the job of an administrator.

We have also made significant progress on the RESTful URL feature. Servlets can have filters that requests are sent to before they reach the servlet itself, and this what we use to provide this functionality. The filter examines the URL before it reaches the servlet and digs out any parameters that have been written in the RESTful form (for example, it will set c=demo from

Sam’s Greenstone Blog 2/3/2012

sjm84. Friday, March 2nd, 2012.

This week has had a rather exciting development that several people have been wanting for quite a long time.  The 64-bit compatible versions of MG, MGPP and GDBM have been added to the main code, meaning that Greenstone 2 and 3 can now compile successfully on 64-bit systems. The reason this has taken a long time to be done is that the 32-bit and 64-bit versions of MG and MGPP produced seemingly different files when run over the same documents, which was a concerning for us as people might want to move their 32-bit MG/MGPP collections over to a 64-bit Greenstone installation and we suspected that this might not work given the different files. This week we discovered the cause of the difference and are now reassured that files from 32-bit and 64-bit installations can be interchanged without issue.

This week has seen more upgrades to Greenstone 3 as well. One of the features we have been working on for the Pei Jones collection is the ability to zoom “screen” images by using the mouse like a magnifying glass. We have added this into the default Greenstone 3 capabilities. In order for this to work however there needs to be a “screen” (small) and “source” (usually larger) version of the same image.

In general Greenstone 3 now handles paged-images much better. They are now properly displayed at the top of their specific sections. There is also an option to change between text-only, image-only and the default text and image modes, which is available in both the paged style collections as well as normal hierarchy style collections.

Next week will most likely involve more improvements like this as we continue to prepare Greenstone 3 for release.