Using ontologies to manage ecological data
|
- By Deshendran Moodley and Avinash Chuntharpursat
When representing ecological knowledge, there are many techniques available to the data manager. The most developed and widely used by far seem to be the XML (eXtensible Mark-up Language) based languages. Of particular relevance to ecology is the XML based language Ecological Metadata Language, or EML (www.ecoinformatics.org).
Ontology languages, which are more expressive than the XML based languages, have emerged recently for representing knowledge. Ontologies are playing an increasingly significant role in the management of ecological data, with many different organisations across the world utilising or developing ontologies for the management of their data. Before we discuss the organisational issues around ontologies, a description of what an ontology is and how it can be used is given below.
Ontology defined
An ontology is a formal specification of concepts and the relationships between concepts in a particular domain. Consider the following simple ontology for fruit. The ontology contains six concepts or classes. These classes are related via a hierarchical relationship. In this case, the "is a" relation, i.e. An Apple "is a" Fruit, an Orange "is a" Fruit etc. Golden Delicious is an Apple, and the "is a" relation is transitive, i.e. Golden Delicious "is a" Fruit, since Apple "is a" Fruit and Golden Delicious is an Apple. (Note "is a" or more accurately "isa" is a formal term in the construction of ontologies.)
Figure 1: Ontology Describing Apple Cultivars as a
Fruit.
Generally concepts are known as classes and these classes have properties, e.g. all Fruit could have the properties, colour and freshness, i.e. the colour and freshness of the fruit. Whereas a class is an abstract concept, an instance of a class is a concrete realisation of the class, e.g. Apple is a general class that describes any apple, whereas a specific apple in a fruit bowl will be an instance, with values for colour and freshness. An ontology language defines the types of relations and constraints that can hold between classes, and the properties of classes. One of the most popular ontology languages is OWL, the Web Ontology Language (note OWL not WOL).
Interoperability and automatic data sharing
Ontologies are being touted as a solution for addressing interoperability and automatic data sharing and processing in large-scale, distributed information systems.
Computing is moving away from isolated closed systems to open, distributed and interactive systems. Building effective computing systems in open environments is a significant challenge. These systems are often heterogeneous, span multiple organisational boundaries and must interact and operate effectively while still maintaining their individual autonomy. As vast amounts of data are usually exchanged and processed, interoperability between systems is crucial.
Standard terminology initiatives exist in most domains for promoting data sharing and interoperability. These standard terminologies define lists or taxonomies of standard terms for describing concepts in some domain. However, they often lack any well-defined semantics and in some cases are ambiguous, inconsistent, incomplete and difficult to extend and to reuse. For example, two systems can use the same terms (syntax) to describe their data, but may not necessarily agree on the meaning (semantics) of these terms, and end up using the same term to describe different things. Sharing data between such systems will pose a problem.
Furthermore, a user that requires data from both systems must cater for these different interpretations. In general, even if two systems use the same standard terminology, it is still not trivial for these systems to exchange data or at all expected that the two systems will do so automatically. Humans are good at dealing with inconsistencies and ambiguity; however computers need formal, precise and unambiguous definitions for data processing. Ontologies aim to provide semantics to existing terminologies by adding a consistent logical foundation, i.e. to specify a body of knowledge so that this specification is understandable by both a machine and non-computer science expert.
The level of precision, detail and logical consistency in an ontology is an indication of the strength of the ontology and determines the degree of automated data processing that can be delegated to the computer. Even weak ontologies are useful as they assist during development of consistent and logical terminologies. By initially setting out rules about how terms should be represented and linked, tools can be used to verify whether these rules have been applied properly, and can detect inconsistencies and logical errors during the development process. This is extremely useful in reaching agreement, especially when a group of domain experts struggle to reach agreement about the representation of specific concepts.
Vision
The vision of the ontology community is to build sharable ontologies that can be used to mark up data, thus enabling computer programs to automatically discover, interpret and process this data. Automated information processing is critical in large scale, open information environments especially in Earth observation where vast amounts of data are continuously being generated. This enables automated alerting and decision support, and data mining and knowledge discovery. There are still many challenges when building, sharing and integrating ontologies. However several initiatives in many domains are already under way.
Sensorweb
Locally, the Sensorweb Initiative based at the Meraka Institute, CSIR is investigating the use of ontologies in integrating sensor data. This was extensively discussed at the 2nd South African International Workshop on Sensorweb enablement, recently held in Cape Town. Internationally, ALTER-Net (an organisation involved in European Long Term Ecological Research) is heavily involved in developing ontologies for the European situation. Currently, the US-LTER is looking at translating Austrian ontologies into English.
SAEON recently convened an email discussion on the future of ontologies and EML. From this discussion, it emerged that EML and generally the various XML based languages, are important in the creation of ontologies. Organisations such as SAEON will therefore benefit from adopting the various XML based standards while supporting the development of ontologies.
This has further implications for International LTER; since one of the purposes of the ILTER Information Management Committee is to look at developing a global information management system that integrates ontologies and mark-up languages.
Further information
Further information on ontologies can be obtained from:
Earth observation links
- Sensor Web Agent Platform (SWAP) - (South African) - see paper and presentation
- NASA SWEET ontologies
- Virtual Solar-Terrestrial Observatory (VSTO)
- Science Environment for Ecological Knowledge (SEEK) - Ontologies have been developed for the SEEK project
- "Representing the Dimensions of an Ecological Niche" Deana Pennington, University of New Mexico, USA - paper and presentation
Biology and Bioinformatic links
- The Open Biological Ontologies (OBO)
- Gene Ontology (GO)
- Plant Ontology Consortium (POC)
- Microarray Gene Expression Database Ontology Working Group (MGED OWG)
- South African National Bioinformatics Network
Getting started with ontologies
The easiest way is to download the Protégé ontology
tool and to work through the ontology 101 tutorial.
Contacts
- Deshendran Moodley
School of Computer Science, University of KwaZulu-Natal, Durban, South Africa
deshen@cs.ukzn.ac.za - Avinash Chuntharpursat
avinash@saeon.ac.za