Detecting gravitational waves
Research to make the first direct detection of gravitational waves is aiming to use them to explore the fundamental physics of gravity, and to develop the emerging field of gravitational wave science as a tool of astronomical discovery.
The LIGO Scientific Collaboration (LSC) works toward this goal through research on, and development of, techniques for gravitational wave detection; and the development, commissioning and exploitation of gravitational wave detectors.
Using the serial partition of the Raven system, the group search for the signs of gravitational waves by processing large amounts of interferometric data.
Responding to a request for assistance from the group’s Dr Stephen Fairhurst, ARCCA led the negotiations with the University over the expansion of the Raven serial computing partition. The aim was to deliver a four-fold performance improvement over the existing 600 core serial partition procured as part of the original Raven system in 2013, to be available by March 2015.
To identify the optimum solution to meet this criteria, the Gravitional Waves group developed a benchmark that ARCCA and Bull staff ran on a number of different clusters to understand the performance profile of the code. Although there had been significant development by the LSC in the USA to effectively exploit GP-GPU solutions, the majority of the GPU-enabled functionality wasn’t currently being used by the UK sites1. With multi-core systems now mainstream, performance comparisons became more complex as each processor release has a range of core counts and clock speeds – so a number of different combinations of tests needed to be undertaken to determine the optimum solution.
Based on this performance analysis, the final proposed solution comprised 1,440 Intel Haswell cores – the E5-2680v3, 12-way 2.5 GHz core solution with 4GB memory per core. This design was agreed across the grant consortium sites (Cardiff, Birmingham and Glasgow Universities) and an order placed with Bull in December 2014. This was the optimal design, based on the power and floor space available in the Redwood datacentre, to deliver the four-fold performance increase whilst occupying a single cabinet footprint in the datacentre.
The systems were delivered and installed into the Redwood datacentre in the week commencing 5 January 2015, with a minor outage on 22 January to integrate this new partition into the Raven cluster management. A few issues were discovered during the testing and these have been documented and will be revisited with Bull – but the basic functionality was operational and would shut down the system in an emergency, the key requirement of these scripts.
The new Haswell partition underwent the formal acceptance tests in February - March 2015, with the expansion now in full production. Although this new queue partition will be prioritised for Gravity Waves consortium member usage, given the burst nature of their computational research, pre-emption will be activated when the system is not in use to enable any parallel jobs with check-pointing capabilities to take advantage of these new resources. This will help free up valuable space on the existing queues, whilst ensuring those jobs can use the most appropriate processor type to maximise efficient usage of the Raven service.