Update on metadata management and workflows
- By Avinash Chuntharpursat, Information Management Scientist, SAEON
Before any detail of metadata management systems is given, perhaps it’s time to go back to the basics with a recap (for those in the know) or an introduction on metadata for those new to the field.
In case you’re wondering what all the fuss is about metadata, let me give a short explanation. If you look at figure 1: the beach scene. This figure is made up of data, millions of pixels make up what you see as a picture. These pixels constitute the data. Just looking at the picture, there’s immediately some information that is evident. This includes: families playing volleyball in the foreground, a wooden pier/wharf in the background. Long shadows on the ground indicate that it was late in the afternoon but it was still pretty bright.
Then move on to the caption below the picture. The caption is actually a bit of metadata (information/data about the data). When reading the caption, a deeper understanding of the picture (data) is obtained. The place - Santa Barbara is not evident from the picture but its there in the metadata. The date and time shows that it was in the evening in spring in the northern hemisphere indicating that the day length is similar to our Cape summers. Similarly, the background structure, who took the picture and why was he there is all explained. So for those who’s job is to write up tables of data please spare a thought for your metadata.
Image 1:
- Title: Family volleyball on the beach
- Photographer: Avinash Chuntharpursat
- Camera: Nikon D60
- Format: JPEG
- Date: 3 May 2008
- Time 19:00 Place:
- Place : Santa Barbara, CA, USA
Abstract: A beach scene showing a families playing volleyball, taken in Santa Barbara. The background is the historical Stearns Warf, which has restaurants and several other tourist attractions. The photographer was on a metadata management course at NCEAS.
In terms of managing the actual data, the metadata is pertinent. Taking into account the above example, a database of thousands of pictures will be difficult to manage and even more difficult for a user to search for suitable pictures. The metadata then comes into play. A user can search for pictures by location e.g. for all pictures from Santa Barbara. (A spatially enabled search with a map as the interface comes in handy.) A search for a particular photographer such as Avinash Chuntharpursat will reveal selected photos by the particular photographer. A keyword search can also be done e.g. a search for photos with beach and/or volleyball will make finding photos simpler. If a set of ontologies are set up, a search for “seashore” would also yield results with “beach”. It is at this particular level that metadata management tools are applicable
Metadata management
With the metadata from figure 1 in mind, let’s go onto what happened at the National Centre for Ecological Analysis and Synthesis or NCEAS in Santa Barbara, USA. NCEAS is responsible for a suite of metadata management systems based on the Ecological Metadata Language (EML). EML is an XML (eXtensible Markup Language) schema, which is used to describe and document ecological and natural science data..
For most scientists, writing raw EML and remembering the 2000 or so specialist tags is a daunting process. In order to streamline the entry of metadata, a product called Morpho was developed. Morpho is a front end which makes the entry of metadata much more convenient. EML is a computer language; Morpho allows the user to enter metadata in a form-like (wizard) interface and takes the hassle out of worrying about technical EML.
On the server side, metadata is stored, catalogued and made searchable by a product called Metacat. A Metacat system is both distributed and centralised. So individual nodes can each have their own Metacat and these Metacat servers can be synchronised to update to each another as well as to a central node. There is also a spatial search function in Metacat for spatial metadata that comply with FGDC and certain ISO specifications. Metacat comes with a built-in Geoserver installation.
Workflows
Another exciting development from NCEAS is the Kepler scientific workflow process. Kepler allows for:
- Allows for a transparent modelling process
- Modelling process can be replicated
- New data easily added to modelling process
- Multiple distributed data sources can be used
Figure 2 is a screen capture of the Kepler
process. NCEAS has worked on this project for several years and the
final product is due for release. No timelines were given but Kepler
version 1 and the new version of metacat (with a turnkey installation)
are due to be released in the next few months. Plans for a web-based
Morpho are also underway.
|
This article merely provides an introduction to the issue of metadata management. SAEON along with PositionIT is preparing a special supplement of PositionIT. The supplement aims to include articles on the mentioned systems as well as the various standards governing metadata management amongst a range of information management topics. In the meantime, check out http://ecoinformatics.org for more information on the NCEAS systems. |