Personal tools
You are here: Home eNewsletter Archives 2007 April 2007 Using ontologies to manage ecological data

Using ontologies to manage ecological data

ont01.jpg
ont02.jpg

Automated information processing is critical in large scale, open information environments, especially in Earth observation where vast amounts of data are continuously generated.

ont03.jpg

One of the most popular ontology languages is OWL, the Web Ontology Language.


- By Deshendran Moodley and Avinash Chuntharpursat

When representing ecological knowledge, there are many techniques available to the data manager. The most developed and widely used by far seem to be the XML (eXtensible Mark-up Language) based languages. Of particular relevance to ecology is the XML based language Ecological Metadata Language, or EML (www.ecoinformatics.org).

Ontology languages, which are more expressive than the XML based languages, have emerged recently for representing knowledge. Ontologies are playing an increasingly significant role in the management of ecological data, with many different organisations across the world utilising or developing ontologies for the management of their data. Before we discuss the organisational issues around ontologies, a description of what an ontology is and how it can be used is given below.


Ontology defined

An ontology is a formal specification of concepts and the relationships between concepts in a particular domain. Consider the following simple ontology for fruit. The ontology contains six concepts or classes. These classes are related via a hierarchical relationship. In this case, the "is a" relation, i.e. An Apple "is a" Fruit, an Orange "is a" Fruit etc. Golden Delicious is an Apple, and the "is a" relation is transitive, i.e. Golden Delicious "is a" Fruit, since Apple "is a" Fruit and Golden Delicious is an Apple. (Note "is a" or more accurately "isa" is a formal term in the construction of ontologies.)

ont.gif
Figure 1: Ontology Describing Apple Cultivars as a Fruit.

Generally concepts are known as classes and these classes have properties, e.g. all Fruit could have the properties, colour and freshness, i.e. the colour and freshness of the fruit. Whereas a class is an abstract concept, an instance of a class is a concrete realisation of the class, e.g. Apple is a general class that describes any apple, whereas a specific apple in a fruit bowl will be an instance, with values for colour and freshness. An ontology language defines the types of relations and constraints that can hold between classes, and the properties of classes. One of the most popular ontology languages is OWL, the Web Ontology Language (note OWL not WOL).


Interoperability and automatic data sharing

Ontologies are being touted as a solution for addressing interoperability and automatic data sharing and processing in large-scale, distributed information systems.

Computing is moving away from isolated closed systems to open, distributed and interactive systems. Building effective computing systems in open environments is a significant challenge. These systems are often heterogeneous, span multiple organisational boundaries and must interact and operate effectively while still maintaining their individual autonomy. As vast amounts of data are usually exchanged and processed, interoperability between systems is crucial.

Standard terminology initiatives exist in most domains for promoting data sharing and interoperability. These standard terminologies define lists or taxonomies of standard terms for describing concepts in some domain. However, they often lack any well-defined semantics and in some cases are ambiguous, inconsistent, incomplete and difficult to extend and to reuse. For example, two systems can use the same terms (syntax) to describe their data, but may not necessarily agree on the meaning (semantics) of these terms, and end up using the same term to describe different things. Sharing data between such systems will pose a problem.

Furthermore, a user that requires data from both systems must cater for these different interpretations. In general, even if two systems use the same standard terminology, it is still not trivial for these systems to exchange data or at all expected that the two systems will do so automatically. Humans are good at dealing with inconsistencies and ambiguity; however computers need formal, precise and unambiguous definitions for data processing. Ontologies aim to provide semantics to existing terminologies by adding a consistent logical foundation, i.e. to specify a body of knowledge so that this specification is understandable by both a machine and non-computer science expert.

The level of precision, detail and logical consistency in an ontology is an indication of the strength of the ontology and determines the degree of automated data processing that can be delegated to the computer. Even weak ontologies are useful as they assist during development of consistent and logical terminologies. By initially setting out rules about how terms should be represented and linked, tools can be used to verify whether these rules have been applied properly, and can detect inconsistencies and logical errors during the development process. This is extremely useful in reaching agreement, especially when a group of domain experts struggle to reach agreement about the representation of specific concepts.


Vision

The vision of the ontology community is to build sharable ontologies that can be used to mark up data, thus enabling computer programs to automatically discover, interpret and process this data. Automated information processing is critical in large scale, open information environments especially in Earth observation where vast amounts of data are continuously being generated. This enables automated alerting and decision support, and data mining and knowledge discovery. There are still many challenges when building, sharing and integrating ontologies. However several initiatives in many domains are already under way.


Sensorweb

Locally, the Sensorweb Initiative based at the Meraka Institute, CSIR is investigating the use of ontologies in integrating sensor data. This was extensively discussed at the 2nd South African International Workshop on Sensorweb enablement, recently held in Cape Town. Internationally, ALTER-Net (an organisation involved in European Long Term Ecological Research) is heavily involved in developing ontologies for the European situation. Currently, the US-LTER is looking at translating Austrian ontologies into English.

SAEON recently convened an email discussion on the future of ontologies and EML. From this discussion, it emerged that EML and generally the various XML based languages, are important in the creation of ontologies. Organisations such as SAEON will therefore benefit from adopting the various XML based standards while supporting the development of ontologies.

This has further implications for International LTER; since one of the purposes of the ILTER Information Management Committee is to look at developing a global information management system that integrates ontologies and mark-up languages.


Further information

Further information on ontologies can be obtained from:


Earth observation links


Biology and Bioinformatic links


Getting started with ontologies
The easiest way is to download the Protégé ontology tool and to work through the ontology 101 tutorial.


Contacts

Document Actions