SAEON assists in building data intensive research infrastructure for SA
|
The Centre for High Performance Computing (CHPC) recently launched its DIRISA (Data Intensive Research Infrastructure for South Africa) initiative aimed at empowering scientists to collect massive amounts of data, marking an important step towards solving complex problems like global climate change.
Responsibility for the establishment and operationalisation of domain-specific resources to serve specific communities has been allocated to SAEON Systems Engineer Wim Hugo.
Domains
The domains under consideration include Earth and Environmental Sciences, Social and Economic Science, Astronomy, and Bioinformatics and Health.
DIRISA has, as a backbone, petabyte-sized storage facilities, currently situated at CHPC in Rosebank, Cape Town, and at CSIR in Pretoria. Policies are available that allow near-real time replication of data objects from one physical location to the other, resulting in very robust fail-over and disaster recovery infrastructure.
Technical Roadmap
A workshop was conducted on 8 December 2011 at the CHPC Annual Meeting at the CSIR in Pretoria, where stakeholders had the opportunity to showcase their work and to collaborate on setting requirements and a collective vision for what DIRISA should provide. The findings of the workshop have been circulated to stakeholders for comment, and will be published shortly, together with a Technical Roadmap to guide the development of DIRISA.
In addition to the domain-specific initiatives, three pilot implementations are envisaged for DIRISA during 2012:
- Testing the benefits of the replicated large storage capacity with multi-dimensional, file-based data sets such as typically encountered in atmospheric, ocean, and climate data;
- Evaluating the feasibility of using the storage capacity for ‘long tail’ type content management systems, involving potentially millions of meta-data records and data objects; and
- Using the infrastructure to host a preservation platform that manages format and physical integrity – this will be available as a secondary archiving and preservation service.
If you are interested in participating in the initiative, please contact Wim Hugo.
Research is increasingly global and multi-disciplinary. Advances in computing technology are empowering scientists to collect massive amounts of data, marking an important step towards solving complex problems like global climate change and uncovering secrets hidden in genes. Data intensive computing capabilities are fundamental for advancing data-intensive sciences, as well as huge volumes of complex data related to energy, health and national security.
Data-intensive research poses both new opportunities and challenges, calling for a new way of dealing with data and enabling a variety of interactions among data management systems, digital data libraries, research libraries, data collections, data tools and communities of research alongside the organisational practices of the people and institutions using it.
Emphasis is placed on handling knowledge freely and dynamically to enable research communities to strike a balance between competition and co-operation.
Source: GRDI2020