Archive for November, 2011

Anu’s entry for week ending 26 Nov 2011

ak19. Monday, November 28th, 2011.

For the last two weeks, I was mainly learning the practical side of how to handle the Greenstone translations. Mainly how to generate the spreadsheets for translators to use, though there was also the opportunity for learning to handle translated spreadsheets. Next to that, there were some questions on the mailing list that I had a go at answering and uploaded the updates to the ACKU and AREU collections.

On the final 3 days, got round to working on getting the batch files in GS2 to handle not only spaces but also brackets in the Greenstone filepaths. There is still a final problem to resolve before the changes can be committed, but the Greenstone web server is now back to working again, despite Greenstone being installed in a path with brackets (and spaces). There’s even some allowance made in the makegs2.bat script–which is used to compile up GS2–to get apache to compile up even in those instances of there being spaces or brackets in the filepaths it works with. Fortunately, the change could be made in the makegs2.bat itself: it sets the command prompt in which Greenstone is being compiled up to be in short-filenames mode. This then is the situation that the apache compile scripts inherit also, making any space/bracket in the long pathname irrelevant.

Sam’s Greenstone Blog 26/11/2011

admin. Saturday, November 26th, 2011.

This week has mostly been spent improving Greenstone 3’s capability to display paged documents. This has mostly involved upgrading the table of contents functionality to better handle documents with a lot of pages and also have names like “Page 1”, “Page 2”, “Page 3” etc. making them virtually indistinguishable by their names. In this case it would be much better if images of the pages were displayed. Fortunately many of these collections will already have these thumbnails available so these will now be displayed in the table of contents instead of their names. Simply replacing the names with images however results in two more problems. The first is that a lot of images take up a lot of space on the page, and the second problem is that it greatly increases the amount that the user has to download from the server for each page. Even though a single black and white thumbnail is likely only to be around 10KB in size, having a thousand of these (which is not unrealistic), or if the images are color then they can quickly add up in size.

To solve both of these problems I decided that a good option would be to create a box in the table of contents that only shows a few pages at a time and can be scrolled from right to left to go through the images of the pages. As well as saving space, this approach also has the added benefit that images do not need to be loaded until they are visible within the box (i.e. they have been scrolled over). So I have implemented it so that images are loaded dynamically as necessary.

I have also added a new feature to Greenstone 3 that may prove useful in improving some of the interactions that happen between XSLT and Javascript. One thing I have been needing to do a reasonable amount recently is take parts of pages and add them to other pages. Our current method for doing this is to get the page we want and to “cut” the detail we want out of it. To hopefully smooth out this interaction I have added the ability for XSL templates to be specified in the CGI arguments given to the page. This allows Javascript AJAX calls to single out the exact part of the page they want or even create new information, all in a single AJAX call.

Sam’s Greenstone Blog 18/11/2011

admin. Friday, November 18th, 2011.

This week has mostly been focused on bug fixing. One bug we discovered a while ago was that the code that highlights search terms in the text would also find occurrences of the terms inside tags (e.g. it would find the word farming in <a href=”farming.html”>farming</a>). The fix was to exclude the characters inside these tags from being considered by the highlight searching code by looking for the < character and ignoring all characters until we see a > character. You may be thinking “But what if there is a < in the document text?”, the answer is that this isn’t an issue as the document text will not contain any of these characters that don’t belong to tags as they will be escaped as &lt; and &gt;.

Another bug I fixed was to do with the Document Structure Editor. The bug was that it always wiped the contents of any images in the collection that was being built, leaving empty files, but the XML files were being preserved fine. The main bug was caused by the index directory not being deleted correctly. This was because the server still had the collection loaded in the runtime system (so that it can be viewed) while it tried to delete its index. So it required that the collection be briefly deactivated in the runtime system so that this replacement (the newly built index replacing the old one) could take place.

Another problem was with displaying paged-image collections. The system would only ever show the root level section and the top level sections and no sections lower than that. I tracked this down to the top levels sections being marked as “leaf” nodes instead of “internal” nodes. Whether this is a bug or whether this has been done deliberately I will try and figure out next week.

Also next week I will do some work on enabling a basic form of spatial searching (searching by locations) in any collections that contain documents with latitude and longitude information.

Anu’s blog entry for the week ending 11 Nov 2011

ak19. Friday, November 11th, 2011.

As several people had encountered issues in the recent 2.85 release, a lot of this week was spent looking at them so that we can get 2.86 out as soon as possible.

The bugs and oversights are not fatal and work-arounds are possible:

1) If you don’t have the PDF-box extension for Greenstone installed already, GLI will suggest where it can be obtained from. However, the URL it provides points to an olderversion of the PDF-box extension, which happens to be one that’s not functional. If you want the version of PDF-Box that works with 2.85, get it from

http://trac.greenstone.org/browser/main/tags/2.85/gs2-extensions/pdf-box/trunk/pdf-box-java.tar.gz

or http://trac.greenstone.org/browser/main/tags/2.85/gs2-extensions/pdf-box/trunk/pdf-box-java.zip

2)  The Greenstone demo collection in 2.85 contains HTML files that can’t get converted into XML properly enough to work well with the flash file generated by the Realistic Book feature. So if you’re thinking of testing out the realistic book option of the HTMLPlugin against the HTML files included in the Greenstone demo collection, rather than against your own HTML files, get the improved demo collection from SVN at http://svn.greenstone.org/main/trunk/greenstone2/collect/demo

3) On Vista, if your Greenstone is installed in a path containing brackets, such as “Program Files (x86)” as can happen on Windows 7 machines, then launching Greenstone is likely to fail. On Windows, spaces in Greenstone’s installation path are okay, but brackets aren’t handled well-enough yet. This will be fixed in a future release of Greenstone 2.

4) The fourth bug is more serious in that there is no work-around. It was found by a member on the mailing list when he was using the Datelist Classifier and discovered that references to [ex.srclink] or [srclink] in his Format statements did not get resolved to the URL of the source file. (However, the default browsing classifiers had no problem with such Format statements and would display the correct URL.) This has now been fixed by Dr Bainbridge and will be present in the next release of Greenstone.

5) Another discovery made is that Ubuntu now seems to have a problem with the open-office extension. This was not  the case some two months back when, after a bugfix, the extension was tested on the Ubuntu both here and by another dedicated member of the Greenstone family on his own Ubuntu. However, the new problem has been confirmed to now exist, including when run from the commandline, and even older versions of the Greenstone extension are performing similarly despite having worked at one point. Perhaps this has something to do with updates on the Ubuntu, but we’ll be investigating it further.

Sam’s Greenstone Blog 11/11/2011

admin. Friday, November 11th, 2011.

This week I have been working on a different area of Greenstone 3 for a change. We noticed that one area that was lacking in Greenstone 3 was the ability to display paged-image collections. For those of you who are not aware, a paged-image collection is a collection of (usually) scanned documents that consist of both the original images and the OCRed text. A good example collection in Greenstone 2 is the Māori Niupepa Collection. At the moment there seems to be multiple issues preventing a collections like this from working correctly in Greenstone 3. As usual we will also be taking this opportunity explore any upgrades for these features as we implement them for Greenstone 3. One particular area that we discussed was around the way that the document could be navigated, we intend to make it easier to scroll through pages. But I’ll go into more detail once I start implementing it and have a better idea of what works well.

I have also fixed more minor bugs in the Document Editor and have also added the ability to modify the text of documents. The next feature in development is the ability to add/remove/modify metadata. We still need to decide on what is the best way to approach this issue as it has the potential to be quite complicated, but once we decide on that it should not take very long to implement and I have already done a lot of the client-side work for it.

Official Greenstone 2.85 released!

ak19. Friday, November 4th, 2011.

At last, we did it. After a lot of testing, bug discovery and fixing, we’ve finally released Greenstone 2.85. It should be much improved from 2.84. There were also some last minute changes from release candidate version 2.

Please do grab a binary for your operating system by visiting the download page at http://www.greenstone.org/download and start using it!

The Release Notes can be found at http://wiki.greenstone.org/wiki/index.php/2.85_Release_Notes

Sam’s Greenstone Blog 4/11/2011

admin. Friday, November 4th, 2011.

My work on the Document Structure Editor is on the back-burner at the moment (although still progressing well) as I have been designing a prototype collection that integrates a map-view into the various parts of Greenstone, to display the spacial information present in the collection. At this point I am modifying the Tipple Paradise Garden collection, which is a test collection created by the developers of Tipple (Tourist Information Provider Digital Library). It is particularly useful as each document in the collection has a latitude and longitude value associated with it.

So, using the Google Maps API I have inserted a map into the browsing, searching and document pages. The map contains markers, marking the locations of the documents contained on that page and the markers can be clicked on to take you to the corresponding document. A information bubble moves from marker to marker displaying the names of each document (this is to avoid having all the names displayed at once, potentially creating a lot of clutter on the map). Next to the usual document links is another link that can be used to focus a single document on the map (centring it).

Next week I’ll be back to my Document Structure Editor work, where I will be trying to figure out why the Seamless Web Editor Javascript isn’t behaving as expected. Assuming I get it working I will be able to add text editing to the interface.