Archive for July, 2020

Update for week ended 31 July 2020

ak19. Friday, July 31st, 2020.

Hi again GS blog readers,

This week has been eventful but only involved a little work on general Greenstone code.  That was mainly in the form of a fix to allow the recently-introduced “Export/Convert (GLI) metadata to CSV” GLI feature to work on Java 6 and 7 too, and not just Java 8. This was a necessary change, as our nightly builds are on systems where Java 6 and 7 are installed. Thanks to Kathy’s keen eye which caught the cause of nightly linux GS3 binaries not going through, this was solvable and solved.

There’s a new bug that I discovered this week when working with someone who uses client-GLI to connect to their remote server. Unfortunately, this is a dreaded encoding bug and may take some time to resolve (or, if I’m really lucky, will be easy and quick to solve!) I will be looking at it as soon as this blog post is done.

During the past few months and ongoing, I’ve been assigned to help an institution who hired the university’s CS department to set up their own Digital Library. So I’ve been learning hands-on what it is like to use Greenstone with real-world purpose, rather than just using Greenstone when testing it for a release or investigating it for bugs discovered by mailing list users, or just using the basic collection design. It’s been quite full on work, but it simultaneously benefits the larger Greenstone community: all bugs discovered so far have been fixed for everyone and are made available in the nightly releases. Most of the bugs discovered had to do with client-GLI and its interaction with the remote Greenstone 3 server. As a result of this process, client-GLI has become far more robust to use.

A consequence of working with our colleagues at that institution to set up a GS3 DL for them is that I had to understand format statements and collection design, so I could pass my understanding on to them. While most of our remote conferencing sessions mixed both general needs and needs specific to the institution, I’d created 2 general pre-recorded tutorials covering all those major topics, which I believe could be of use to all persons learning to become Greenstone librarians. The topics covered are collection design from scratch, ranging from configuring browsing classifiers and search indexes, and configuring the UnknownPlugin for MP4s and associating files/equivalent documents, to understanding and working with format statements, including creating basic reusable templates in the global format statement. The second video covers using MetadataCSVPlugin with document metadata in the form of a CSV spreadsheet and ends by going over how to add collection/document level security (pasword protection).

The first video is 3.5 hours long and has some sensitive content visible at present in the form of passwords and private server connection details that would first need editing out before being made public. The second video clocks in at 1 hour and is more or less ready for viewing, except both suffer from my embarrassing grating voice and from format statements being very slow to edit back then, plus I’m not comfortable that there’s a brief shot of my head in the first video. After editing the unwanted segments out of the first video, a task which I have no experience in yet, I’d ideally like to subtitle the videos and then strip out the audio altogether, which would also allow the subs to be translatable, assuming the videos are found useful. I don’t know when I’ll have the time for this, but I’ll do my best to work.

That about covers what I wanted to blog on this week. Until next time!

Improvements to remote GS3 and client-GLI

ak19. Friday, July 24th, 2020.

In the past several weeks, using client-GLI running against a remote Greenstone 3 in a real-world setting allowed many bugs to be found and fixed and some hopefully-useful new features to be added to (client-)GLI.

No official release of GS3 containing these features and bugfixes is available yet, but those described below are/will be available in the nightly GS3 binaries at  http://www.greenstone.org/caveat-emptor/ from today onward. The linux nightly binaries are temporarily down and we’ll try to get them back up.

Among the work done:

(1)  Better remote GS3/client-GLI support for different sites and servlet names.

Once you’ve adjusted GS3/web/WEB-INF/servlets.xml as in the GS3 customisation tutorials and set the default servlet in GS3/build.properties, in client-GLI, go to File > Preferences > Connection and choose the Site then Servlet name. Click Apply and OK. Now once you go to File > Open collection, you will see all the collections available in this site and previewing will use the correct servlet.

Rebuilding will activate the collection on the selected servlet so that previewing will now at last work for non-default site and servlet.

Fixed a bug swapping between different remote GS3 sites that client-GLI can connect to: in the past, client-GLI would get stuck trying to load in the previous’ session’s site and servlet, even if it doesn’t exist in the remote GS3 that client-GLI is currently attempting to connect to. Now it will resort to the default site and servlet if the stored one is not present in the remote GS3 server the client is connecting to.

(2) Improvements to working with collectionConfig.xml through (client-)GLI:

- client-GLI (and GLI) now  properly saves edits made to the collectionConfig.xml file through Edit > Edit collectionConfig.xml menu

and furthermore, these changes are immediately reflected in the (client-)GLI interface, instead of GLI reloading the collection as before (which used to take especially long in client-GLI)

- proper support for HTML formatted text in the “about” page description for a collection: Format > General > Format Description field

Now, when you edit the collectionConfig.xml file through the Edit > Edit collectionConfig.xml menu, any HTML in your collection description is still preserved as before. And when you preview, the GS3 reader interface also preserves it as you intended.

(3) Can successfully create new and edit existing Metadata Sets through client-GLI now.

In the past it would let you create a metadata set but then there were issues when you tried to edit an existing one. Also in the past, creating a new metadata set would cause subtle issues that you’d only actually notice if you tried to visit File> Preferences > Connection tab afterward (when client-GLI would freeze).

(4) Completer and improved support for Metadata spreadsheet CSV files:

- MetadataCSVPlugin was extended to allow multi-valued metadata fields by Dr Bainbridge and his improvements to the plugin have now been incorporated into the current Greenstone. The MetadataCSVPlugin included in GS3 allows multi-valued metadata fields as follows: configure the plugin now with the new “metadata_value_separator” field set to “\|”. Then in your CSV metadata spreadsheet cells, use the vertical bar (”|”) to separate multiple metadata values for a particular column denoting a metadata field.

- Fixed bugs related to (client-)GLI rightclick  > Replace feature on a document that occurred when you attempted to replace an existing file with another file of the same name. Although this fixes the feature in general, it is also useful for when you want to update your metadata CSV spreadsheet.

Update from a week later: When you’re using Replace to replace a file with an updated identically named one, GLI always popped up a message allowing you to cancel. However, in the past, even if you cancelled, client-GLI would continue to upload the replacement file to the GS3 server where the replacement would be performed. The remote GS3 files and local files on the client machine would then get out of sync. But with this bug fixed, if you now cancel the Replace operation on replacing a file with an identically named one, client-GLI will no longer send the replacement file to the remote server.

(5) New (client-)GLI features:

a. Metadata to CSV options:

- File > Export to metadata CSV: for a collection you have open, this option creates a metadata.csv file in a location of your choosing containing all the metadata you can see in GLI, including inherited metadata. If the metadata.csv file you selected already exists, then the metadata you see in GLI is amalgamated with the selected CSV file. This option allows you to backup your collection’s metadata to a spreadsheet file. There is NO RECONVERT feature, to convert back to metadata.xml files from metadata csv format. But you can build your collection with metadata from the CSV spreadsheet. See the following option below which explains how to redo your collection to work with metadata from a spreadsheet instead of using metadata in GLI/metadata.xml files.

- File > Convert to metadata CSV: for the collection you have open, this option creates a metadata.csv file in your collection’s “import” folder by default (which is the best location), by destructively removing all the metadata from the collection’s metadata.xml files (in other words, by removing the metadata you see in GLI) and shifting them out into the selected metadata.csv file. If you selected an existing metadata.csv file, then any metadata you currently see in GLI is amalgamated with the selected CSV file, before it gets removed from GLI/the metadata.xml files. Selecting this option prepares your collection so that you can switch over to using a MetadataCSVPlugin, configured with metadata_value_separator field set to “\|”, to then rebuild your collection producing the same results as before.

b. Collection security skeleton elements, as discussed at http://wiki.greenstone.org/doku.php?id=en:user_advanced:security, can now be added through (client-)GLI’s Edit > Edit collectionConfig.xml menu option. At the bottom of the Config File Editor dialog that appears, you will find a small toolbar that allows you to choose which (skeleton) XML <security> element to add:

- to hide the current collection,

- to add the appropriate <security> element to make the entire collection private except for one or more groups you specify,

- to add the appropriate <security> element to make all the docs in the collection private except for the groups you specify (adds a <security> element),

- to add the appropriate <security> element to make select docs in the collection private except for the groups you specify (where you can then specify which docs as explained on the wiki link already provided),

- to add a further <documentSet> element to the existing <security> element

- to  add a further <documentSet> element into the existing <security> element

One issue with this remains: if you want to undo the addition of a security element, you have to press the Undo button twice at present. I haven’t yet figured out why this is. (If you press Undo once, the entire XML content of your collectionConfig.xml becomes empty, so you’ll naturally press Undo again in alarm and then it will look right again.)

 (6) Possibly one of the best client-GLI improvements of all: Editing format statements in client-GLI is no longer excruciatingly slow

In the past you had to wait several seconds for every character you entered, so that back then it was better to edit your format statements outside client-GLI and paste them back in. With the bugfix now in place, you can now finally easily edit format statements directly in client-GLI.

Other: 

* incorporated perl 5.30 support needed for Ubuntu 20.04 LTS

* bundling CGI.pm perl module now, so hopefully no more “ERROR: Can’t locate CGI.pm in @INC (you may need to install the CGI module)” messages

* I believe I’ve now also fixed a bug that caused deadlocks in client-GLI which could occur with some popups when some remote Greenstone3 action goes wrong

* better error display when something on remote GS3 goes wrong: instead of a giant window to contain a giant error message, and which can potentially exceed your own screensize, you get a decent sized dialog with scrollable pane

* bugfix to “replace  srcdoc with html” feature available on rightclicking a doc in GLI so that it now works again in (client-)GLI.

* TextPlugin will now properly preserve manually formatted text went it embeds text content in <pre> tags. Starting whitespace even in pre tags used to get clobbered before so that it used to lose tabspaces.