Conferences & Events

Data movers: look who’s talking at eRA2015

Fields marked with an * are required

Subscribe to our newsletter

Data transfers depend on research networks

AARNet has convened a Data Movement (DaMo) stream at the upcoming eResearch Australasia 2015 Conference on 20 October. A series of presentations by community champions who have “been there, done that” will take attendees on a data movement journey, featuring real-world use cases in delivering data to peak research capabilities.

These contributors from a variety of disciplines have discovered the value in skills and tools that lead to the most efficient and effective way to handle data between collaborators, and have acknowledged that resources most useful for manipulating and storing such data could be anywhere on the planet.

The Data Movement stream is just one of a number of activities in the works supporting our goal of raising the “bar of expectations” so that researchers both demand and expect more from their IT infrastructure in terms of its ability to gain access to and move very large quantities of data.

We hope to see you at eRA2015 – here’s the DaMo program:

13.30 – 13.55pm  Big sky, big data ‐ transporting digital data across the world

Presenter: Dr Chris Phillips, CSIRO

Modern radio telescopes produce enormous amounts of data. While traditionally this data was generated using propriety digital systems and processed on site with custom digital hardware, astronomers are transitioning to more commodity hardware and transport protocols and distributed processing. Telescopes across Australian are currently generating petabytes of data a year, which is copied across Australia and to the rest of the world. Next generation telescopes such as the CSIRO ASKAP telescope and the international SKA will increase by orders of magnitudes this data volume.

In this talk Chris will outline the ways high speed networks are utilised for data transfer and the challenges faced in this process, and will present an overview of the requirements for next generation telescopes.

14:00-14:25pm  Production Petascale Climate Data Replication at the NCI

Presenter: Jon Smillie, NCI

Authors: Joseph Antony, Ben Evans

As research moves from efforts spanning a single nation to a model of global collaborative research, the use of shared computation, storage and cloud infrastructure requires access to massive reference data sets.

Australian climate researchers use the NCI’s computation, storage and cloud infrastructure in investigating our climate and for generating publications which contribute to the United Nation’s IPCC report. This research requires timely access to the climate model output from the CMIP5 project which holds information from many international climate modelling centres.

The CMIP5 data is currently stored in three locations globally. Lawrence Livermore National Labs located in the USA, the British Atmospheric Data Centre (BADC) and the German Climate Centre (DKRZ). The addition of a copy located in at NCI Australia will allow Australian climate researchers to achieve their research outcomes faster.

In this talk the presenter will share experiences and the lessons learnt in connecting NCI in Australia to the global high performance WAN Data Transfer Node infrastructure to support transfers of large data sets to and from Australia.

NCI provides some guidance on network tuning needed to achieve high performance data transfers using a range of tools for large data set transfer including the Globus Transfer online tool which can orchestrate data movement to and from external sites from the NCI’s three global Lustre file systems (/g/data{1,2,3}). This capability offers data managers the ability to securely and reliably replicate very large volumes of data between NCI and collaborating data repositories using the Globus Online facility. NCI’s dedicated cluster of Data Transfer Nodes (DTNs) have been tuned to allow high throughput IO onto the NCI’s production Lustre file systems. NCI is able to provide scalable performance should additional bandwidth become available by adding more DTN nodes. This includes planned upgrades to the internal bandwidth, the current AARNet links, and as additional international replication sites become available.

14.30 – 14.55pm  Data At A Distance; Medium to Long Haul Data Delivery

Presenter: Stephen Dart, VicNode

The first storage service at VicNode was NFS protocol based and only available within the eResearch DMZ and not to the public internet. The on campus hosts already operating, the DTNs, and NeCTAR VMs were able to make use of this to present internet interfaces.

The early test environment for Aspera as set up by RDSI/DaShNet connected to the only storage VicNode had available. The use of LDAP to the MASSIVE users database service enabled user/password logins with group management.

This enabled VicNode to leverage the early test environment for Aspera as set up by RDSI/DaShNet through 2014. Aspera Shares was workable well before official handover was signed and production licenses installed in 2015.

With support from V3 staff and Aspera, that toe hold was sufficient for VicNode staff to ingest 225TB for researchers.

Several transfer strategies were used, some were weeks long at a low rate as to not stress ad hoc networking arrangements.

Some lessons were learned and a service model refined. A pathway to future services can now be designed and built with a mind to ease integration with other research tools and widen general use.

15:00-15:25pm  Science DMZ: Raising (and Meeting) DaMo’s Expectations

Presenter: David Wilde, AARNet

Researchers across a broad range of disciplines are increasingly turning to digital tools and digital techniques to collect, process and analyse their data as part of their research methods and workflow. As a consequence, the volume of data being moved and shared across research networks continues to increase by ~50% every year, and campus and National Research and Education Network infrastructures have been upgraded to support n x 10Gbps and n x 100Gbps. This infrastructure can provide 100TB of Data Movement (DaMo) in a single day.

But do researchers expect to be able to achieve that ? If not, why not ? If so, how can it be delivered consistently and robustly ? In answering these questions, this talk will make the case for the adoption of the Science DMZ architecture at every institution or facility that aspires to support data-­‐intensive research.

15:55 – 16:20pm  The RDS Moving Big Data Flagship

Presenter: Richard Northam, RDS

The DashNet initiative, funded by RDSI and NRN and run across the AARNet backbone, connects the major eResearch Nodes across Australia to each other and to the wider research community with redundant, high­‐speed network links (1‐10Gb/s). Each Node Operator has a complete ScienceDMZ implementation providing high­‐performance access to the data storage infrastructure at the Node Operator. Experience since then has shown that this infrastructure is extremely capable of moving bulk data between sites, on­‐demand, to support collaboration and analysis.

However, as the RDS Project has engaged with various researchers and research groups working at the coal face, it has become apparent that there are opportunities to further improve the services: by reaching more deeply into institutions and research groups, where the researchers are; to analyse network performance issues; and to provide data movement services that address the pressing needs of researchers. The RDS Project will under its Moving Big Research Data Flagship explore this challenge in targeted areas by collaborating with discipline communities and institutions in partnership with the RDS Node Operators. The presentation will discuss the Moving Big Data Flagship and seek further input from the audience.

16:25 – 16:50pm  Data Movement Panel Discussion

Recent investment in eResearch infrastructure in Australia has delivered a world‐class toolset into the hands of researchers for data creation, movement, processing and storage. Massive multi‐Petabyte data stores have significant data collections of national importance and peak HPC capability has scaled in response to the demand for big data analysis. With the next generation of the research and education network it is now possible to move 1 Petabyte of data in a single day from one single resource to another.

How do we take full advantage of this entire eResearch system that the Australian research community has at its fingertips?

The panel session will be an open discussion with the speakers aimed at identifying needs in skills, workflows, tools and policy to reduce friction for getting data to where it’s most effective for faster research outcomes.

Related Stories

Conferences & Events / eResearch

Apr 27, 2021

AARNet Train-The-Trainer Workshops

We're pleased to announce that bookings are open for...

Conferences & Events / Services

Oct 28, 2020

October Brunch and Learn webinar highlights

October saw the continuation of the AARNet...

Conferences & Events / Services

Aug 27, 2020

August Brunch and Learn webinar highlights

In the August webinar of the AARNet...