Ecological data quality
|
- Avinash Chuntharpursat, Information Management Scientist, SAEON
“What is a good data set?”
This is a question that has been posed many a time to a data/ information manager, and is one of the most difficult questions in the Data Management field to answer.
The reasons for this is that the complexity of the data process chain involves the planning of experiments and observations, methodology used, stringency in collecting data, accuracy of electronically capturing data and the dexterity of preserving data. Inclusive in the data chain process is the analysis and reuse of data to yield higher-level information products and the accuracy and relevance of the analysis. Figure 1 is a flowchart of the ecological data handling process.
From Figure 1, three important components make up the data chain. These are the metadata, quality control and quality assurance.
Metadata is the data/information about the data. A good metadata record is needed along each step of the data chain. The entire history of the development of the dataset should be recorded and tracked in the metadata. A suitable standard for the capture and storage of metadata should be used. The Ecological Metadata Language (EML) is one such standard. More information on EML and metadata can be accessed from the following sites: www.ecoinformatics.org and http://www.eepublishers.co.za/view.php?sid=15047.
The other two components are Quality Control (QC) and Quality Assurance (QA). In Figure 1, QA and QC are associated with different steps of the data chain. Quality assurance is largely associated with data creation (such as planning and methodology) and analysis steps. Quality control is largely associated with the data capturing and archiving steps. For a better understanding of why and how this occurs, a closer look at the definitions of QA and QC is needed.
Figure 1: Flowchart of Ecological Data Handling Process
For the purposes of this exercise, the definitions of QA and QC that are used by the Intergovernmental Panel on Climate Change (IPCC) are found to be suitable. These definitions are based on greenhouse gas emissions, but can be applied in a broader ecological context. The definitions from chapter 8 of the “IPCC Good Practice Guidance and Uncertainty Management in National Greenhouse Gas Inventories” are as follows:
“Quality Control (QC) is a system of routine technical activities, to measure and control the quality of the inventory as it is being developed. The QC system is designed to:
- Provide routine and consistent checks to ensure data integrity, correctness, and completeness;
- Identify and address errors and omissions;
- Document and archive inventory material and record all QC activities.
QC activities include general methods such as accuracy checks on data acquisition and calculations, and the use of approved standardised procedures for [emission] calculations, measurements, estimating uncertainties, archiving information and reporting. Higher tier QC activities include technical reviews of source categories, activity and [emission] factor data, and methods.
Quality Assurance (QA) activities include a planned system of review procedures conducted by personnel not directly involved in the inventory compilation/ development process. Reviews, preferably by independent third parties, should be performed upon a finalised inventory following the implementation of QC procedures. Reviews verify that data quality objectives were met, ensure that the inventory represents the best possible estimates of emissions and sinks given the current state of scientific knowledge and data available, and support the effectiveness of the QC programme.”
For the implementation of QC activities, various software packages are available for particular fields of research. These packages aid in the identification of missing values and other errors such as statistically significant outliers. However, a strong understanding of the nature of the data is a prerequisite for quality control activities.
In the definition of QA, reference is made to reviews by a third party. This is particularly relevant to the SAEON situation. The SAEON nodes - which conduct research and observations in different bioclimatic regions of the country - have node liaison committees that act as independent auditors of data quality.
Many data producing organisations have such technical committees. These committees could also play an important role in ensuring that the data is of the highest standard for their respective organisations.
References
- Good Practice Guidance and Uncertainty Management in National Greenhouse Gas Inventories, 2000. Intergovernmental Panel on Climate Change (IPCC). http://www.ipcc-nggip.iges.or.jp/public/gp/english/
- Ecological Circuits Issue 1 2008. SAEON / EE Publishers. http://www.eepublishers.co.za/view.php?sid=15047