Massimo Lamanna, from CERN’s IT Storage group visited Australia recently to speak at the QUESTnet2016 conference about new services developed by CERN (The European Organization for Nuclear Research) to manage the data deluge from LHC activities. While developed to serve the needs of the LHC, these services have the potential to aid data analysis in many fields. AARNet technologists are contributing to the development of some of these services.
High-performing data services are at the heart of research activities at CERN . CERN data services support a worldwide community of more than 10,000 researchers and engineers exploring the fundamental structure of the universe using complex scientific instruments, such as the Large Hadron Collider (LHC), the world’s largest and most powerful particle accelerator.
Lamanna is responsible for all the disk-data management operations at CERN. He is responsible for managing the data from the LHC experiments to the CERN computer centres; for the CERN disk farm exchanges with collaborating centres world-wide; for serving data to be processed and recorded to tape; and enabling thousands of physicists for the final data analysis.
For the past couple of years, AARNet engineers have been collaborating with the CERN IT Storage Group to test distributed deployments. The AARNet network’s capabilities for moving huge volumes of data across Australia’s vast continent helps to inform research for advancing networking and data services for science.
Here’s some key information Lamanna shared at QUESTnet 2016.
Data Services for Big Data
In 2015, more than 30 petabytes (PB) of LHC data were routed to tape for long-term archival, processed by the CERN 200,000-core farm and redistributed across the 200+ sites of the Worldwide LHC Computing Grid, including a site in Australia at the University of Melbourne. Daily data production and analysis involves 1000s of researchers and CERN systems routinely sustain input and output transfers integrating many tens of gigabits per second.
With the 2016 data taking in full swing, the CERN Advanced Storage (CASTOR) tape archive has now exceeded the 150 PB mark.
To manage user access and analysis for this massive volume of data the CERN IT Storage group has developed new services: EOS and CERNBox.
EOS is the high-performance storage solution for batch processing and data analysis, controlling a farm of over 150-PB of disks, and CERNBox is the cloud synchronisation service for CERN users that interconnects with EOS, and with a count of approximately 5000 users to date, it is rapidly gaining wide acceptance within the CERN community.
EOS – a low latency storage infrastructure service for physics users
EOS is an open software storage solution developed at CERN to deliver affordable commodity storage to scale to meet the growing requirements of CERN experiments.
It provides high data availability and high concurrent access for a variety of use cases and work-flows. Presently EOS installations span two distinct sites (22 ms-latency across 3 distinct 100-Gbps links).
The physicists from CERN experiments see a single hierarchy of files with sharing capabilities and fine-grained ACLs; data managers benefit from optimised data access (closest file replica) for their large-scale heavy-duty batch processing; and the operations team manages single installations across the two CERN sites with tuneable quality of service.
CERNBox – the cloud storage service for CERN users.
Physicists and other CERN users were using public cloud services to exchange documents and other information about experiments, so it made sense to develop an in-house service that would be as easy to use but ensure information remained on site while meeting the unique needs of CERN users.
CERNBox is based on ownCloud for the synchronisation and sharing layer and on EOS for the actual storage and transfer layer. CERNBox coupled with EOS supports the traditional synchronisation and share use cases, while allowing full access to the entire CERN data repository as needed, for example, in the case of massive data analysis.
EOS has been showing its excellence in the multi-Petabyte high-concurrency regime. It has also shown disruptive potential by enabling the CERNBox service to provide synchronisation and sharing capabilities at a new scale.
CERNBox/EOS has also generated interest as a generic storage solution, ranging from university systems to very large installations for non-High Energy Physics research areas, such as satellite imaging and in partnership with the European Commission’s Joint Research Centre, for the Copernicus project. CERNBox is also used by a United Nations agency for training and research for emergency response activities.
For AARNet, access to the architecture and learnings behind the development of CERNBox has been invaluable for informing the development of the architecture for the AARNet CloudStor service for the research and education sector.
Joining forces with AARNet to pilot a wide-area distributed deployment of EOS
Collaborations include piloting a wide-area distributed deployment in collaboration with AARNet. Demonstrations to date include a single EOS instance with 300-ms latencies across storage nodes spanning across Europe, Asia and Australia.
In partnership with CERN and other collaborators, AARNet is a participant in the OpenCloudMesh project led by GEANT(the pan-European network). This project aims to enable global interconnected private clouds for research and education.
SWAN – service for web-based analysis
CERN is launching SWAN (Service for Web based Analysis), a platform to perform interactive data analysis in the cloud (via web browser). This service has great potential not only for researchers but as an education outreach tool, enabling students to interact with data from the LHC in real time.