A different kind of ecosystem
|
As the 21st century gets underway, the world is facing significant challenges from change - climate variability, momentous changes in the levels of income and resource consumption of developing nations, habitat destruction and species loss at an unprecedented rate, and increasingly large impacts from natural events on the expanding human population as we utilise previously marginal or dangerous resources.
|
As a result of these challenges, the last two decades have been characterised by enormous investment in the science of global change by the governments of the world: the 2010 snapshot1 from the Organisation for Economic Cooperation and Development (OECD) and selected non-OECD countries indicates the extent to which the rich countries of the world are committing to RDI in general (exceeding a trillion dollars per annum), and the relative importance of the role researchers play as a significant complement of society in these countries.
Interestingly, South Africa (ZAF) lags (even after corrections for purchasing power parity) in the number of researchers that are available relative to the expenditure on R&D - though this may be skewed due to sizable investment in infrastructure.
The investment in monitoring and measurement, specifically, has led to a situation where the Science of Data needs to start supporting and enabling the Data of Science. We are faced with a deluge of data from these increasingly detailed sensor networks, and the dissemination, preservation, and interpretation of such large volumes of data require researchers to work within a very different future laboratory - a virtual research environment.
Added to this, are the potential and the challenges associated with citizen science, voluntarily contributed data, and the explosively growing network of smart devices in the hands of ordinary citizens, increasingly capable of performing sensory tasks.
We briefly discuss a number of important drivers and trends in the field of scientific data management:
Free and open access to data
Free and open access to research output, especially publicly funded research output, is the oil of the knowledge economy. Restricting access, or making it uneconomical for individuals, small businesses, and the public at large inhibits the extent to which the private sector contributes to economic growth, job creation, and wealth. Limiting availability to researchers constrains a virtuous circle of accelerated research and development. Limiting availability to only a few researchers is the equivalent of hiding money under a mattress.
The developed countries of the world view the promotion of this principle2 as an imperative with significant side benefits; required because the bulk of research outputs have already been funded by the taxpayer and hence should be accessible by the public, and beneficial since it could make a significant contribution to mitigation of the symptoms of the digital and knowledge divide.
Developing countries, on the other hand, have been mostly reticent in respect of open sharing of data - believing that opening up data to a wider user base will somehow deprive local beneficiaries of its value. While it may be so in isolated cases, it is my firm belief that there will be (and already is) a net flow of benefit to the developing world, and that embargoes on data should be the exception and not the norm. Specifically, SAEON is advocating and promoting policy-driven access to research data and funding for the long-term provisioning of such access.
Increased international collaboration
The last decade has seen increasing willingness and opportunity for global collaboration in research infrastructures, driven by the realisation that there is more to gain by pooling resources than there is to lose, and that the collaboration should start with - but extend beyond - data sharing. There is an emerging new ecosystem of research, development, and innovation infrastructure that
- does not only include the cyber-infrastructure that underpins it, but recognises that using it is difficult without significant attention to a software layer that fosters interoperability (syntactic, schematic, and semantic),
- removes the need for deep technical knowledge from end users,
- and pays equal attention to capacity building and entrenchment of good data management principles as a basic tool of scholarly research work.
Working in the cloud
Central to this emerging global research and innovation infrastructure is the reduced reliance on local resources: it should not, and does not matter to the end user where these resources are provisioned physically.
On the other hand, there will always be a need for control - but our conception that control means localisation needs to be amended and is under challenge already. Because public cloud services and infrastructure are so easy to access and use, the major threat to control and management of a new ecosystem of research data infrastructure is contained in the fact that it lags these publicly available services: it is imperative to catch up, and catch up quickly, in the availability of private cloud infrastructures that integrate with the public cloud, is managed and controlled to comply with institutional, technical, and legal requirements, and is underwritten in terms of basic availability and longevity by the primary funders of research and innovation in each country.
The semantic ideal
Early in the development of the internet, its conceptual thought leader, Tim Berners-Lee, defined the concept of the semantic web - arguing that it will become increasingly possible to link items of data and information that are persisted at billions of nodes.
While it is clear that the semantic web is beneficial from a theoretical point of view, there are many concerns about its viability from a practical perspective: it will be too large and unspecific. With this in mind, the web of Linked Open Data has emerged - which can be viewed as a subset of the semantic web that will be useful to science specifically, focusing on linking data through persistent identifiers for all the things that are important to us - species, genes, ecosystem structures, physical features and phenomena, people, institutions, processes, relationships, and so on.
Data citation and publication
A parallel to linked open data, and one that both relies on its rapid implementation and will drive its rapid adoption, is the emerging practice of data citation and publication. In this paradigm, scientists who primarily focus on production and quality assurance of data will receive recognition through citation - requiring the widespread availability of data that is properly described, has a persistent identifier and will be preserved, and is available in a widely adopted standard format or service.
Linked Open Data concepts pervade the management of data publication and citation - linking journals and articles, researchers, and their research outputs in a distributed semantic web that can be exploited in many ways.
Better investment management and knowledge networks
We have indicated earlier that governments are investing heavily in grant-funded research and innovation, but how do they know that these investments are well made? Intuitively, it is easy to accept that there will be significant gaps and overlaps in the research that is done specifically to benefit society and address global challenges - a fact recently recognised by the major funding nations through the new programs initiated by the Belmont Forum.
To address this problem, a need has arisen to not only describe the research and innovation outputs of global science adequately through meta-data, but also to describe the supporting fabric of science a lot better: who funds what, who collaborates and why, which topics are addressed in more or less detail, and so on.
With this in mind, the ICSU World Data System has embarked on the concept of a distributed Knowledge Network that adds value to the meta-data resources such as already available by surfacing and extending the relationships that exist between the elements of meta-data, recognising that significant contributions may come from non-traditional sources such as social media.
A manifesto for the new ecosystem
What does this mean for a new ecosystem, and what are the drivers for the science of data and its implementation for the foreseeable future? Over the last year or so, SAEON has developed a declaration of intent that serves as a vision for our future cyber-infrastructure – along these lines:
|
This manifesto underpins a quiet revolution in the world of scientific research and its management, and SAEON is collaborating locally and internationally to realise this ideal; specifically through its involvement in DIRISA (Data-Intensive Research Initiative for South Africa), where a bottom-up programme with the Centre for High Performance Computing is now being complemented by top-down support from the Department of Science and Technology (DST) through its Committee for Cyber-Infrastructure.
We are hopeful that substantial new facilities that meet the requirements expressed by the manifesto, and are broadly aligned with the trends highlighted in this discussion, will come on stream within in the next year.
References
2 https://www.gov.uk/government/publications/g8-science-ministers-statement-london-12-june-2013
Further reading
The OECD publishes comprehensive biannual reviews of the funding landscape for research, development and innovation - start here: http://www.oecd-ilibrary.org/.
The case for free and open data sharing is abundantly supported by the developed world. Read the recent statement by the science ministers of the G8 here: https://www.gov.uk/government/publications/g8-science-ministers-statement-london-12-june-2013
SAEON has been advocating these principles for some time now - read our policy advice here: http://urlmin.com/saeon_policy_advice
The international world of data science, and collaboration therein, have to a large extent been formalised in the Research Data Alliance. SAEON is currently a participant through the ICSU World Data System, but South Africa may also become a formal member in the near future. See http://rd-alliance.org/
ICSU is a significant enabler in all of the above initiatives, promoting data science through CoDATA, and enabling access to quality assured data, meta-data, and knowledge networks actively through the World Data System. The pressing need to coordinate the focus and expenditure of funders and science programmes in respect of global change is now the subject of a new ICSU programme - Future Earth. See http://www.icsu.org/future-earth/
CoDATA has also been instrumental in developing the ground rules and industry consensus on data citation and publication through one of their task groups (http://www.codata.org/taskgroups/TGdatacitation/index.html), a role that has now been taken over by a joint working group of the Research Data Alliance and the ICSU World Data System:
Finally, coordinating future global funding towards the provision of research data infrastructure is being pursued by the Belmont Forum of the International Group of Funding Agencies for Global Change Research (http://igfagcr.org/index.php/activities). The ICSU World Data System hopes to support this process directly with its meta-data and knowledge network working group - http://icsu-wds.org/working-groups/metadata-catalogue-and-knowledge-network