Archive for the ‘Greenstone2’ Category

Anu’s entry for the week of Mon 13 Jun 2011

ak19. Friday, June 17th, 2011.

Since Dr Bainbridge was away this week and because I was at an impasse regarding my last ticket for GS3, I had already decided last Friday to consider two requests that seemed feasible and just required a bit of investigation:

1 . Diego noticed that use_sections didn’t work with the PDFBox extensions. So some changes were then made to the PDFBox code to generate the html in a paged fashion and adjust PDFPlugin to handle it. It turned out that changes to the PDFBox code were unnecessary after all, and the latest binary was all that was needed to work with the updated PDFPlugin: the latest PDFBox jar file was inserting page separator elements already and the PDFPlugin merely needed to pick up on that.

2.  Professor Witten had suggested that if someone had Word or Office 2007 installed on their Windows, windows_scripting should be able to convert docx to html for us without requiring Open Office. Last week I had tried out whether this was already possible: but word2html which used native windows scripting cropped the “docx” extension down to “doc” and declared it couldn’t be found. It was not possible to get at the VB source code to modify it, so the next idea was to find some WSH scripts on converting docx (WSH tends to be switched on on Windows by default). There was a WSH script on the web for extracting all the *text* from a docx. It wasn’t quite what was wished for, since formatting would be lost.

Fortunately for us, Veronica recalled the existence of a VBscript for WSH that promised to do just what we needed: docx to html. After she located it for us, all that was required were some modifications to integrate it with to get things to work in the default case: where Office/Word 2007 was installed. It worked fine on XP. Then some further changes for error handling needed to be inserted on Word not being installed on a machine. Having got the error output to go to STDERR from the VBscript, it now did pretty well on the Vista where there was no Word either.

It still needs to be tested for how it acts on a Windows which has a version of Word predating the docX format.

3. The idea is to expand the VBS script to have subroutines to handle xlsx and pptx files as well. Some bit of the code for pptx is already working (opening the document), but there may be some differences between opening or saving things in Word and Powerpoint, as the universal Office SaveAs method wasn’t working for me.

4. Temporarily fixed a bug in GS’s classifiers which was noticed on the mailing list and sent the tentative fix to the notifier:

When a user enters non-English characters for a buttonname, perl does not preserve them and so it displays wrong in the browser. The fix required me to assume that the user would have input this in UTF-8, for which I got the perl to work with it now. But need to talk to Dr Bainbridge about whether my assumption was reasonable before commiting the code for all.

There were some questions in the mailing list which I finally got round to answering. There is still Diego’s request for implementing the “allvalues” option in the List classifier to look at, and number 3 above.

Anu’s entry for week of 6 June 2011

ak19. Monday, June 13th, 2011.

Mainly small odds and ends. From making sure that the GS2 OAI server was validating against a new online validator (at which point the resumptionToken functionality was retested), very minor bug fixes such as making sure images in PagedImage collections built with xml item files won’t get reprocessed by ImagePlugin and some other questions on the mailing list. Spent time investigating how to implement use_sections with the PDFBox to PDFPlugin (can try updating the PDFBox code to split on a page at a time) and on Friday was (still unsuccessfully) trying to figure out problems on circumventing hard-coded GS2 {If} format statements in metadata so that things still work with GS3, as in ticket

Anu’s entry for week beginning 30 May 2011

ak19. Friday, June 3rd, 2011.

Fixed a server crashing bug reported on the mailing list (bug was traced to GSDLQueryLex.cpp).Finally worked out a rudimentary way to get an Exact Phrase option in the GS interface for the Web Administrator of our university library, who was faced with this problem. Got GS2 to pass all the remaining OAI validation tests at last. And also got the earliestDatestamp working for GS2’s OAI server as it should (it works out the earliestDatestamp in the manner that GS3 was changed to do it). This means it no longer always returns the unix epoch time of 1970 for all Greenstone OAI repositories, as it used to.

Anu’s entry for week beginning 23 May 2011

ak19. Tuesday, May 31st, 2011.

Fixed a server crashing bug, answered some questions on the mailing list, looked into a GS3 thumbnail issue which ultimately Dr Bainbridge ended up solving, and started on fixing the validation issues with the OAI server for Greenstone 2.

Anu’s entry – week ending Fri 20 May 2011

ak19. Monday, May 23rd, 2011.

Mainly bug fixing.

There were a couple of outstanding unicode bugs that needed fixing (the rest were fixed the week before) such as in MetadataXMLPlugin. There were a few changes that had been made to GS3 which caused it to cease compiling properly which also required fixing. Finally, there was also the ticket where GS3’s List Users didn’t display all the details for a new user. That’s now been fixed too.

25 April – 6 May

ak19. Monday, May 9th, 2011.

Most of the last two weeks were spent on making the final changes to the work Dr Bainbridge had already done to get Greenstone to work again if you have moved the Greenstone installation. It is now the case that if you relocate your GS2 installation that running the Greenstone Server Interface (GSI) will get the server successfully running from its new location, on both Linux and Windows. Underneath, there are differences (because the apache web server for Linux and Mac has its current location fixed into many of its files, which then need to be adjusted upon relocation), but the different operating systems provide the same “reset-gsdlhome” target in gsicontrol.bat and .sh, which is what the GSI calls whenever this application’s launch script is run. The reset-gsdlhome target can also be called from the command-line. The changes that needed to be made for this had the nice side-effect that gsicontrol can be run from any directory.

At the end of last week, we started looking at the GS284 bugs that Diego found.

Share your documents in Facebook or Twitter

Diego Spano. Thursday, April 7th, 2011.

Greenstone has a new macro that lets you share documents in social networks or email systems, using Addtoany tool . The new macro is called _shareme_  that belongs to package Global in The macro accepts two parameters: _1_ is the title of the link, and _2_ is the link to share. For _2_, the [srclink] is the default option, but any other metadata can be used too. The only requirement is that the value of that metadata must contain a well-formed URL that begins with “http://”.

If _2_ is left blank, then the link will point to the Greenstone version of the document.

You have to edit your format statement and add something like this:_shareme_([dc.Title],[srclink])

and then you will see “Share+, Facebook, Twitter, Mail , LinkedIn” icons.

_Share_ Icons

There is also a brief version called _sharemesmall_ that requires the same parameters and only shows Share+ icon.

_Sharemesmall_ Icons

The macro code is available with version 2.84. If you are using v.2.83 or earlier you have to edit file and add the following block:

**** Macro code – Begin ****

package Global

# Social network support
# Defined here in document, as the most likely place this will be used in
# within a document view, however its package is 'Global' because you
# might equally want this in a search or browse list

# _1_ = e.g. title
# _2_ = [srclink] or left empty.  If left empty, then it will share the internal GS document

_sharemescript_ {

<script type="text/javascript">
function fullDomainURL(localURL)
return window.location.protocol+'//';

<script type="text/javascript">
var a2a_config = a2a_config || \{ \};
a2a_config.linkname = "_1_";

var srclink = \'_2_\';

//If metadata value is a valid URL that starts with xxx://
// (e.g. any protocol\, http, https\, ftp ...) then that will be the link to share
if (srclink.match(/^[^:]+:\\\/\\\//i)) \{
a2a_config.linkurl = srclink;
else \{
//if metadata value is [srclink] then we have to cut off the 'href' tag label
var href = srclink.match(/href=\"([^\"]*)\"/);
a2a_config.linkurl = fullDomainURL(href[1]);
//if no metadata was passed as link\, then the GS version of the document will be used.
a2a_config.linkurl = fullDomainURL("_gwcgi_")+ "?c=_cgiargc_&a=d&d=_cgiargd_";

_shareme_ {

<div style=\'padding-left:50px;\' class=\'a2a_kit a2a_default_style\'>
<a class=\'a2a_dd\' href=\'\'>Share</a>
<span class=\"a2a_divider\"></span>
<a class=\'a2a_button_facebook\'></a>
<a class=\'a2a_button_twitter\'></a>
<a class=\'a2a_button_email\'></a>
<a class=\'a2a_button_linkedin\'></a>
<script type=\"text/javascript\" src=\"\"></script>

_sharemesmall_ {

<span style=\'padding-left:8px;\' class=\'a2a_kit a2a_default_style\'>
<a class=\'a2a_dd\' href=\'\'>Share</a>
<script type=\"text/javascript\" src=\"\"></script>

**** Macro code – End ****

Greenstone 2.84 released!

ak19. Friday, April 1st, 2011.

After last week’s bug discovery got fixed at the start of this week (there were issues with HTML files that had non-English filenames interlinking on a Mac OS), we went back to testing the Greenstone binaries on Windows, Linux and Mac. Finally, after uploading all the files onto SourceForge and adjusting the pages there as well as updating’s own download page, we succeeded in releasing Greenstone 2.84 today!

To grab the Greenstone 2.84 binary for your operating system, visit the download page at This page also has the source distributions available in zip and tar.gz formats. Otherwise, you can always expand your binary installation with source code by grabbing the “source-component” archive files from the same download page.

The Greenstone 2.84 Release Notes contain installation instructions as well as details on how to use the latest Greenstone extensions like the PDFBox extension (for later versions of PDF) and OpenOfficeConverter (which can handle the latest Office docx format).

Anu’s entry for the week ending 25 March 2011

ak19. Monday, March 28th, 2011.

This week we tried to get Greenstone 3 to work on on Dr John Brine’s 64-bit Mac OS 10.6.6 machine (with Update 3) so that the Flax developers could make sure that Flax, which works with Greenstone 3, would run as well.

We didn’t have admin access on Professor Witten’s machine of similar specs, and therefore could not install the Java Developer Package which would contain the jni.h and other header files that had been relocated since Mac 10.6 Update 3 and whose absence prevented proper compilation on such Macs. However, on Dr John Brine’s machine, having admin privileges meant we could install this and get Greenstone 3 to compile on a Snow Leopard Update 3 at last. Running was  a separate issue: there were problems with the server. Sam discovered that the Java Developer Package (which contained JDK 1.6.0) would be 64 bit as well and found a flag to force Java to run in a 32 bit mode: -d32. Upon using that flag to launch the Apache web server, things ran smoothly on the server side of things. There are still issues with GLI opening collections and re-building collections in Greenstone which we will be returning to later.

Some configure and makefiles were updated to allow Greenstone to compile without compiling up wvware (since this had libiconv problems on the Mac 10.6.6, and compiling up gnome-lib on the machine has issues of its own). The –disable-wvware will allow us to temporarily bypass that and focus on other problems first.

In between, another staff member had moved his 3-year old Greenstone 3 installation elsewhere and his Greenstone web-service related application had stopped working. We got a new version of Greenstone 3 and made some configuration changes to get it all to work again.

Then it was back to testing Greenstone 2.84 to ensure that the important fixes Sam and Dr Bainbridge had made in the last weeks interacted well. I’m onto Mac testing at the moment. By Friday, we discovered that while documents with non-English unicode filenames were mostly working on the Mac, HTML files that interlinked (where the links referred to non-English filenames either directly or in URL-encoded format) did not. We started investigating this.

Release Candidate 2 of Greenstone 2.84 out

ak19. Friday, February 25th, 2011.

After a lot more testing, discovering bugs, fixing bugs and further testing, we’ve finally generated our 2nd release candidate for Greenstone 2.84. This time, there’s also a binary for the MacOS (tested on Leopard).

Most of the bugs this time round (after Release Candidate 1) had to do with dynamically linked libraries on the MacOs, getting the new document plugins from Greenstone Extensions to work when the Greenstone server is on a remote machine, Imagemagick processing JPEG2000 images on Linux machines, and fixing some issues around installing Greenstone on Windows in a path containing spaces.

Things seem to look good so far (but we’d appreciate independent confirmation of it), so if you haven’t already, do grab a copy and try it out. And tell us about your discoveries: any difficulties, bugs or other insights, so that we can make the final release of Greenstone 2.84 perfect.

Greentone 2.84 has cool new features, including the long-awaited ability to process docx and other new Office formats, as well as recent versions of PDF documents. These features are available as extensions to Greenstone, for which the details can be found in the Release Notes.

Download the Greenstone 2.84 RC2 from the Snapshots page.

The Greenstone wiki page 2.84 Release Notes has information on installation, running, where to get the new Greenstone extensions and more.