greenstone.org greenstone wiki greenstone trac planet greenstone

Sam’s Greenstone Blog 27/1/2012

sjm84. Friday, January 27th, 2012

This week I have been touching up on a few unfinished features. One of these was the mapping features that I can’t remember whether or not I have written about before. Basically, if your documents have coordinate information (i.e. latitude and longitude information) we now have a feature that will map those documents on a map. This feature can now be really easily enabled. We will write some documentation on this when we get the chance.

The new theme is also added which makes Greenstone 3 look a lot nicer. We’re still working on the ability to allow easy theme changing. We need to get the authentication working before we can enable this feature, as only the collection administrator should be able to change the theme.

Next week I will experimenting with trying to get more standard URLs in Greenstone 3 (e.g. http://localhost:8383/greenstone3/dev/collection/demo/document/HASHc5bce2d6d3e5b04e470ec8) rather than what we currently use.

P.S. If you’re wondering why there is no update from Anu this week, it is because she is away in India for 5 weeks on holiday.

Sam’s Greenstone Blog 20/1/2012

sjm84. Friday, January 20th, 2012

One of the things we have been doing this week is deciding the best way to handle user authentication in Greenstone 3. We have a very basic system in place at the moment but we would like something more robust. At the moment we are investigating using the authentication system in the web-server we use for Greenstone 3 (Apache Tomcat). We need to make sure it has the flexibility we require so that collection administrators have the power to allow/prevent users access to the collection as well as (possibly) access to individual documents.

I have been continuing to assist the masters student I mentioned last week. We have been working on a way to download and replace parts of a collection via the web interface. We think that this functionality may be useful if you want to add/replace an image or run an image through your own OCR program for example.

Finally, I have been further adding to Greenstone’s CGI metadata capabilites, filling in any holes that are missing in the API. As part of this I have started developing a Javascript API which should (theoretically) make using these CGI calls a lot easier.

Sam’s Greenstone Blog 13/1/2012

sjm84. Friday, January 13th, 2012

My time this week has mostly been spent helping out one the masters students here in our lab. I have been helping her develop the ability to tag photos and text in the Greenstone 3 collection she is working on. This has resulted in us enhancing our Greenstone 3 (and also Greenstone 2) CGI capabilities at the same time to get this working correctly. This upgrade was needed so that we could save metadata to the index, archive and import directories easily from Javascript. Some of the functionality was already there but functionality like the ability to remove metadata from the import directory (for example) was missing.

One problem we had to get around was the fact that you cannot reliably specify the position of a piece of metadata that you want to change/delete in a metadata.xml file because of the way import metadata is handled in Greenstone. We decided that a good way to get around this is to have to specify the previous value of the piece of metadata that you want to change/delete. The only problem with this approach is if you have more than one identical piece of metadata, do we delete just one? or all of them? Most likely we will add an option to specify what to do in this situation.

Next week I will most likely be working on some authentication functionality for Greenstone3.

Sam’s Greenstone Blog 6/1/2012

sjm84. Friday, January 6th, 2012

Happy new year to all Greenstone users! We’re back at work now after a couple of weeks off over the holiday period and already we’ve got a few new things lined up.

In Greenstone we try very hard to make the modification of the look and feel of collections as easy as possible.  Unfortunately this often requires knowledge of web standards like HTML, CSS and Javascript, and in the case of Greenstone 3 it is also helpful to have knowledge of XML and XSLT. We understand that many Greenstone users will have very little knowledge of these topics, so we are looking at incorporating a very simple way of changing the appearance of a Greenstone collection.

JQuery UI has a system called ThemeRoller that allows you to create your own visual theme via an easy to use web interface. You can then download the required files to use that theme in your own website. We are currently experimenting with making Greenstone 3 compatible with these themes (which are made up of a CSS file and some images). So far it is looking promising and will hopefully prove to be a welcome addition to Greenstone 3.

It has been a short week this week so there’s not a lot to report, but next week I shall be continuing on this development as well as (most likely) starting to write some up-to-date documentation for Greenstone 3, as we have made it our goal this summer to spend a large part of it working on Greenstone’s documentation.

Anu’s entry for the week ending 2 Dec 2011

ak19. Friday, December 2nd, 2011

Continued on the problem that I thought had been almost resolved last week: getting the batch files in GS2 to handle not only spaces but also brackets in the Greenstone filepaths. The batch files were done, but the perl code needed some correcting too. After inspecting many files in order to see whether they needed correcting, the GS2 code seems to work well on Windows even where Greenstone is installed in a path containing brackets.

This week, I was able to finally return to the problem of jodconverter not interacting well with the LibreOffice on the Ubuntu 11 whereas the same worked perfectly against an OpenOffice on the CentOs machine. We decided that perhaps OpenOffice had different behaviour for the signals sent by jodconverter. Installing OpenOffice turned out harder than expected and I think I botched it. I ended up having to uninstall all openoffice files and libreoffice files and then reinstalled all of libreoffice. At this stage, upon trying jodconverter again, it was found to work fine each time. This seemed to confirm the suspicion that some updates to Ubuntu may have messed up some libraries or something, breaking LibreOffice a little.

However, despite things now working again, Sam wondered, very correctly, whether a user’s experience would be this convoluted or whether it would work straight away for them. He suggested trying out a VM of Ubuntu 11. Which is what I did. It was my first VM installation and after installing a Ubuntu 11.10 VM on Sam’s Windows 7 (which comes with LibreOffice), Greenstone with the open-office extension fortunately worked fine on a sequence of word documents.

On Friday, got round to Diego’s long-standing question at last: about the possibility of a single metadata.xml at the import level which defines the metadata for all files in import’s subfolders. Dr Bainbridge had already confirmed earlier that this was indeed possible, but the question was of how the metadata.xml out to specify the path to the files in the subfolders, especially if there were spaces in the path. After a series of incremental tests, it was found out to be still possible and the solution rather straightforward. Hopefully it will work for Diego also.

There was some translation work, and a few further questions on the mailing list to look at, before I finally got round to considering Michael Goodwin’s complex question on the setup.exe generated by an Export To CD-ROM operation failing on Windows 7 on 64 bit. A preliminary successful test on a Windows 7 machine turned out to be misleading: I had assumed it was a 64 bit machine but it turned out to be 32 bit after all. I will have to get back to trying this out next week. All this fine-tuning is bound to pay off in the upcoming perfected release of Greenstone 2: version 2.86.

Sam’s Greenstone Blog 2/12/2011

sjm84. Friday, December 2nd, 2011

This week I have been tidying up the new paged-image functionality so that it dynamically loads each page (rather than doing a full page reload each time) and also added the functionality that allows the user to choose from “Text view” (which only shows the OCR’d text), “Image view” (which shows the original image) and “Default view” (which shows both the text and the image). These are also switched dynamically which is nice and are remembered if you leave a document page and go to a new one.

I also fixed up an annoying problem with GLI. One of the ways you can customise collections in Greenstone 3 is by writing Javascript in the collectionConfig.xml file and those familiar with XML will know that you cannot put ‘&’, ‘<’ or ‘>’ into text nodes (you have to replace them with &amp;, &lt; and &gt; respectively). These special characters a relatively common in Javascript so each time they are used they have to be escaped. The problem we were having with GLI was that it would read in the file and replace the characters with their usual forms (&, < and >) and when it went to save the file it wouldn’t escape these characters. So the next time this file was read in GLI would produce an error because the file was no longer valid XML. We eventually tracked this problem down and fixed it.

Next week I will continue to work on the paged-image functionality (specifically the “next page” and “previous page” buttons) as well as adding some new code to HTMLPlugin that will add any files referred to in CSS files (e.g. background-image) as associated files of the HTML page.

Anu’s entry for week ending 26 Nov 2011

ak19. Monday, November 28th, 2011

For the last two weeks, I was mainly learning the practical side of how to handle the Greenstone translations. Mainly how to generate the spreadsheets for translators to use, though there was also the opportunity for learning to handle translated spreadsheets. Next to that, there were some questions on the mailing list that I had a go at answering and uploaded the updates to the ACKU and AREU collections.

On the final 3 days, got round to working on getting the batch files in GS2 to handle not only spaces but also brackets in the Greenstone filepaths. There is still a final problem to resolve before the changes can be committed, but the Greenstone web server is now back to working again, despite Greenstone being installed in a path with brackets (and spaces). There’s even some allowance made in the makegs2.bat script–which is used to compile up GS2–to get apache to compile up even in those instances of there being spaces or brackets in the filepaths it works with. Fortunately, the change could be made in the makegs2.bat itself: it sets the command prompt in which Greenstone is being compiled up to be in short-filenames mode. This then is the situation that the apache compile scripts inherit also, making any space/bracket in the long pathname irrelevant.

Sam’s Greenstone Blog 26/11/2011

sjm84. Saturday, November 26th, 2011

This week has mostly been spent improving Greenstone 3’s capability to display paged documents. This has mostly involved upgrading the table of contents functionality to better handle documents with a lot of pages and also have names like “Page 1″, “Page 2″, “Page 3″ etc. making them virtually indistinguishable by their names. In this case it would be much better if images of the pages were displayed. Fortunately many of these collections will already have these thumbnails available so these will now be displayed in the table of contents instead of their names. Simply replacing the names with images however results in two more problems. The first is that a lot of images take up a lot of space on the page, and the second problem is that it greatly increases the amount that the user has to download from the server for each page. Even though a single black and white thumbnail is likely only to be around 10KB in size, having a thousand of these (which is not unrealistic), or if the images are color then they can quickly add up in size.

To solve both of these problems I decided that a good option would be to create a box in the table of contents that only shows a few pages at a time and can be scrolled from right to left to go through the images of the pages. As well as saving space, this approach also has the added benefit that images do not need to be loaded until they are visible within the box (i.e. they have been scrolled over). So I have implemented it so that images are loaded dynamically as necessary.

I have also added a new feature to Greenstone 3 that may prove useful in improving some of the interactions that happen between XSLT and Javascript. One thing I have been needing to do a reasonable amount recently is take parts of pages and add them to other pages. Our current method for doing this is to get the page we want and to “cut” the detail we want out of it. To hopefully smooth out this interaction I have added the ability for XSL templates to be specified in the CGI arguments given to the page. This allows Javascript AJAX calls to single out the exact part of the page they want or even create new information, all in a single AJAX call.

Sam’s Greenstone Blog 18/11/2011

sjm84. Friday, November 18th, 2011

This week has mostly been focused on bug fixing. One bug we discovered a while ago was that the code that highlights search terms in the text would also find occurrences of the terms inside tags (e.g. it would find the word farming in <a href=”farming.html”>farming</a>). The fix was to exclude the characters inside these tags from being considered by the highlight searching code by looking for the < character and ignoring all characters until we see a > character. You may be thinking “But what if there is a < in the document text?”, the answer is that this isn’t an issue as the document text will not contain any of these characters that don’t belong to tags as they will be escaped as &lt; and &gt;.

Another bug I fixed was to do with the Document Structure Editor. The bug was that it always wiped the contents of any images in the collection that was being built, leaving empty files, but the XML files were being preserved fine. The main bug was caused by the index directory not being deleted correctly. This was because the server still had the collection loaded in the runtime system (so that it can be viewed) while it tried to delete its index. So it required that the collection be briefly deactivated in the runtime system so that this replacement (the newly built index replacing the old one) could take place.

Another problem was with displaying paged-image collections. The system would only ever show the root level section and the top level sections and no sections lower than that. I tracked this down to the top levels sections being marked as “leaf” nodes instead of “internal” nodes. Whether this is a bug or whether this has been done deliberately I will try and figure out next week.

Also next week I will do some work on enabling a basic form of spatial searching (searching by locations) in any collections that contain documents with latitude and longitude information.

Anu’s blog entry for the week ending 11 Nov 2011

ak19. Friday, November 11th, 2011

As several people had encountered issues in the recent 2.85 release, a lot of this week was spent looking at them so that we can get 2.86 out as soon as possible.

The bugs and oversights are not fatal and work-arounds are possible:

1) If you don’t have the PDF-box extension for Greenstone installed already, GLI will suggest where it can be obtained from. However, the URL it provides points to an olderversion of the PDF-box extension, which happens to be one that’s not functional. If you want the version of PDF-Box that works with 2.85, get it from

http://trac.greenstone.org/browser/main/tags/2.85/gs2-extensions/pdf-box/trunk/pdf-box-java.tar.gz

or http://trac.greenstone.org/browser/main/tags/2.85/gs2-extensions/pdf-box/trunk/pdf-box-java.zip

2)  The Greenstone demo collection in 2.85 contains HTML files that can’t get converted into XML properly enough to work well with the flash file generated by the Realistic Book feature. So if you’re thinking of testing out the realistic book option of the HTMLPlugin against the HTML files included in the Greenstone demo collection, rather than against your own HTML files, get the improved demo collection from SVN at http://svn.greenstone.org/main/trunk/greenstone2/collect/demo

3) On Vista, if your Greenstone is installed in a path containing brackets, such as “Program Files (x86)” as can happen on Windows 7 machines, then launching Greenstone is likely to fail. On Windows, spaces in Greenstone’s installation path are okay, but brackets aren’t handled well-enough yet. This will be fixed in a future release of Greenstone 2.

4) The fourth bug is more serious in that there is no work-around. It was found by a member on the mailing list when he was using the Datelist Classifier and discovered that references to [ex.srclink] or [srclink] in his Format statements did not get resolved to the URL of the source file. (However, the default browsing classifiers had no problem with such Format statements and would display the correct URL.) This has now been fixed by Dr Bainbridge and will be present in the next release of Greenstone.

5) Another discovery made is that Ubuntu now seems to have a problem with the open-office extension. This was not  the case some two months back when, after a bugfix, the extension was tested on the Ubuntu both here and by another dedicated member of the Greenstone family on his own Ubuntu. However, the new problem has been confirmed to now exist, including when run from the commandline, and even older versions of the Greenstone extension are performing similarly despite having worked at one point. Perhaps this has something to do with updates on the Ubuntu, but we’ll be investigating it further.