AARNet’s CEO Chris Hancock writes about the Big Data challenge:
This post also appeared on the ABC Tech + Games blog
The old saying “less is more” is arguably more relevant today than ever before. In an increasingly complex and noisy world, efficiently extracting meaning from the torrent of raw data presented to us every day is critical.
One might assume that the more information you have, the better, but too much information can quickly become unwieldy.
A great deal of research has gone into the idea of ‘choice’. The Paradox of Choice: Why More Is Less, by psychologist Barry Schwartz, supposes that when there are more than a just few options, the strain of selecting just one can result in anxiety. The problem lies in our ability to quickly and objectively work through the data so we feel we have actually been able to make the ‘right’ choice.
Data is only useful to us for decision making if it is processed into meaningful information of some sort. In the digital era we have access to so much more data than ever before and find ourselves grappling with a paradox – we have the capacity to store it, but what about the ability to quickly move it about, or analyse it to gain insight and greater understanding – how do we do that? This in essence is what is being referred to as the Big Data challenge – turning numbers into sense; or data into wisdom.
Cisco proposes that this monsoon of ones and zeros is the by-product of the ‘Internet of everything’, where the convergence of machine-to-machine, person-to-machine, and person-to-person communications will result in 1.5 trillion connected things by 2020. It’s incredible to think about and it’s resulted in a lot of media coverage.
While Big Data is a trending topic for the commercial sector, for the research and education sector it’s kind of old hat. AARNet has been providing the infrastructure for transporting large data sets for processing and analysis for international research collaborations for years. We know it as the DIKW model (Data, Information, Knowledge, Wisdom) – otherwise referred to as the hierarchy of cognition, as data moves to information on the back of its relationships, to knowledge on the understanding of patterns, and then to wisdom through the application of principles.
The Large Hadron Collider
One project that has captured many imaginations is the search for the ‘God particle’, the Higgs boson. On the quest to answer the plethora of questions about this hotly debated construct, The Large Hadron Collider has been generating data set upon data set, in order to run tests to prove the theory. However, data is not evidence in and of itself.
In processing these tests, data has streamed across AARNet’s trans-Pacific links at more than 600 megabits per second (Mbps) to share the burden of experiments between educational institutions across the globe. In fact, there has been so much data created and transferred, it has peaked at over 1 gigabit per second (Gbps). Traffic in the opposite direction, from the University of Melbourne to Canada’s National Laboratory for Particle and Nuclear Physics, has also been significant, averaging at times over 100Mbps. As a result, we are now close to understanding what adds mass to particles that seem to have no weight on their own. If we know what adds mass, we could potentially take it away. With a weightless composition, space travel into the farther reaches of the Universe becomes closer to reality than science fiction.
The Square Kilometre Array (SKA)
On the theme of space, another fantastic example where Big Data will be turned into wisdom is with the SKA. The SKA will be a network of thousands of radio telescope antennas that offer a total signal-collecting area of one square kilometre – one million square metres. To achieve this, the SKA will use 3,000 dish antennas along with many aperture array antennas (the latter will be located in Australia) to delve deep into the darkest corners of space. Along the way, the telescope will generate enough raw data to fill 15 million 64 gigabyte iPods every day.
The power to transform Big Data into wisdom in this instance is in the linkages. Connections between these individual antennas enable the data generated to be transformed into information which is greater than the sum of its parts. These insights will pool into a map of understanding – a celestial encyclopaedia, helping us comprehend more about where we have come from and what potential lies in wait.
Supporting Big Data pioneers
Based at the University Of Queensland Institute Of Molecular Bioscience, a 4.1 terabyte BioMirror is another major catalyst for Australia’s research data super-highway, the AARNet network. It pushes and pulls large quantities of data to multiple sources an average of 5,235 times a day in order to enable researchers to extrapolate intelligence that will potentially resolve our bioscience challenges. One of these sources, the European Bioinformatics Institute (EBI) itself has around 19 petabytes of data storage and perhaps six petabytes of unique data under management, but much of that data is being transferred to the local BioMirror for analysis.
The data sets are physically located near the National Computational Infrastructure (NCI) Specialised Facility in Bioinformatics – a 3,144-node computing installation, so the Australian bioscience community can do data-centric computing. This is vital in applying recombinant DNA sequencing methods and bioinformatics, to sequence, assemble, and analyse the function and structure of genomes. It’s an incredible investment in our nation’s intellectual capital and in our ability to innovate at scale.
Not only this, but these learnings can also be applied to the sequencing of cancer genomes (pancreatic and ovarian). In fact the International Cancer Genome Consortium has so far generated approximately 120 terabytes of processed data to search for a cure for this terrible disease.
We also expect these kinds of projects will be greatly enhanced by our cloudstor+ solution which offers AARNet customers the ability to easily and securely store files in the Cloud. The system is based on community developed ownCloud technology, employing an Apache Hadoop backend. CloudStor+ offers up to 100 gigabytes of storage to eligableresearchers at no charge and is located in Australia, directly connected to the AARNet backbone, for rapid and convenient access, minimising any data sovereignty issues.
Our culture of innovation is deeply ingrained and only looks set to continue as Big Data projects offer our research community insight and wisdom into some of the most perplexing scientific questions our brightest minds can ask.
Oct 13, 2017