Skip to main content

Our supercomputer

Our supercomputer ‘Falcon’ can be accessed from anywhere with an internet connection.

About Falcon

Falcon is Cardiff University’s newest High Performance Computing (HPC) cluster, providing a major step forward in capability compared with its predecessor, Hawk, which was retired in January 2026. While the most recent Hawk compute nodes have been retained and integrated, Falcon is being further expanded in early 2026 with additional CPU and GPU resources.

This growth is enabled by our “pluggable infrastructure” design, which delivers a high performance shared core for all researchers while enabling the seamless addition of researcher funded subsystems. This model ensures long term flexibility, efficiency and architectural resilience.

Hosted in a state of the art CoLo data centre near Cardiff, Falcon features more than 10,000 AMD CPU cores and a diverse suite of GPUs including NVIDIA H200, H100 and L40s accelerators supporting data intensive, simulation driven and AI focused research. High performance storage and networking are delivered via NVIDIA (Mellanox) HDR InfiniBand fabric, providing the bandwidth and low latency required for modern scientific workloads.

Initial benchmarking indicates that many research applications run four to five times faster on Falcon compute nodes compared with Hawk, offering a substantial boost to research productivity and enabling larger, more complex simulations and analyses. Falcon also supports the University’s sustainability objectives. The system uses Direct Liquid Cooling, which is significantly more energy efficient than traditional air cooling. Over the system’s first five years of operation, this technology is expected to deliver considerable reductions in electricity consumption and operational costs.

Technical specification

Brief overview:

  • current total number of cores on Falcon is +10,000
  • total useable capacity of Lustre global parallel file storage is 1.1 PB
  • total useable capacity of NFS partition for longer-term data store is 457.2 TB
  • core nodes are connected with Nvidia Mellanox InfiniBand HDR (200 Gbps / 1.0 μsec) technology.

The current cluster consists of cores comprising:

  • AMD Genoa core partition nodes
  • AMD Genoa dedicated researcher expansion
  • AMD Rome Hawk core partition nodes
  • AMD Milan and Rome dedicated researcher expansion

Standard compute nodes (Core Partition)

  • CPU: AMD EPYC 9654 “Genoa” @ 2.4 GHz
  • Cores per node: 192
  • Memory: 768 GB per node (4 GB per core)
  • Local Storage: OS SSD Micron 5400 MAX 480GB, SATA, 2.5", 3D TLC, 5DWPD,7mm

High memory nodes

  • CPU: AMD EPYC 9654 “Genoa” @ 2.4 GHz
  • Cores per node: 192
  • 1,536 GB per node (8 GB per core)
  • Local Storage: OS SSD Micron 5400 MAX 480GB, SATA, 2.5", 3D TLC, 5DWPD,7mm

Hawk compute nodes

  • CPU: AMD  EPYC 7502 "Rome" @ 2.5 GHz
  • 32 cores/socket (2.5GHz,128M,180W) giving 64 cores per node
  • 4GB RAM per core (ECC DDR4 3200MT/s single rank RDIMMs)
  • 240GB SSD disk
  • Single port ConnectX-6 HDR100 QSFP56 Infiniband Adapter.

GPU accelerator nodes

Accelerator nodes comprising:

Nvidia H200
  • 4‑GPU High‑density 2U system with NVIDIA® HGX™ H200
  • 5th Gen Intel® Xeon® Scalable processor support
  • 32 x 32GB DDR5-4800 2RX8 LP (16Gb) ECC RDIMM5
  • OS SSD Micron 5400 MAX 480GB, SATA, 2.5", 3D TLC, 5DWPD,7mm
  • Data NVMe Micron 7450 PRO 3.8TB NVMe PCIe 4.0 3D TLC U.3 7mm,1DWPD
  • HDR NIC 2-ports 200Gb HDR 200GbE QSFP56 Mellanox CX-6 VPI, Gen4 x16 LP
  • Direct-To-Chip Liquid Cooling solution
Nvidia H100

Supermicro Dual Socket HGZ NVL SuperServer comprising:

  • 4-GPU High density 2U system with NVIDIA® HGX™ H100
  • 5th Gen Intel® Xeon® Scalable processor support
  • 32 x 32GB DDR5-4800 2RX8 LP (16Gb) ECC RDIMM5
  • OS SSD Micron 5400 MAX 480GB, SATA, 2.5", 3D TLC, 5DWPD,7mm
  • Data NVMe Micron 7450 PRO 3.8TB NVMe PCIe 4.0 3D TLC U.3 7mm,1DWPD
  • HDR NIC 2-ports 200Gb HDR 200GbE QSFP56 Mellanox CX-6 VPI, Gen4 x16 LP
  • Direct-To-Chip Liquid Cooling solution
Nvidia L40S

Supermicro Dual Socket GPU Supersever comprising::

  • 8-GPU NVIDIA Ada L40S 48GB GDDR6 PCIe Gen 4th
  • SPR 6430 2P 32C 2.1G 270W 60MB BI(1000) E5 4677
  • 32 x 32GB DDR5-4800 2RX8 LP (16Gb) ECC RDIMM
  • OS SSD Micron 5400 MAX 480GB, SATA, 2.5", 3D TLC, 5DWPD,7mm
  • Data NVMe Micron 7450 PRO 3.8TB NVMe PCIe 4.0 3D TLC U.3 7mm,1DWPD
  • HDR NIC CX-6 VPI,HDR,200GbE,2p,QSFP56,PCIe4x16
Nvidia V100

Dell PowerEdge R740 server comprising:

  • Intel Xeon Gold 6248 2.5GHz processors
  • 20 cores/socket (2.0GHz, 10.4GT/s, Turbo+, 150W) giving 40 cores per node
  • 4.8GB RAM per core (192GB ECC DDR4 memory 2933MHz)
  • 240GB SSD disk
  • Single Infiniband EDR/PCIe Gen3-x8 interface embedded in the motherboard.
  • 2 (two) NVidia tesla V100 16GB PCIe GPU cards

The High Speed, Low Latency, high performance interconnect is provided by an InfiniBand HDR/EDR combined network.

A spine-leaf topology with a 3:1 blocking ratio is implemented for the Falcon core to ensure efficient use of current resources.  The core network is underpinned by an HDR InfiniBand fabric with connectivity to an EDR fabric used for access to legacy Hawk nodes.

There are two main storage sub-systems, a fast 1.1PB cluster file system based on EXAScaler parallel Lustre file system from DDN, and a redundant DDN Intelliflash Network File System (NFS) of ~457TB.

Libraries

Various library packages including:

  • Intel Math Kernel Library - Cluster Edition Medium Cluster License for Linux
  • FFTW
  • netCDF
  • gsl
  • CUDA for GPU applications
  • Software licenses for TrinityX HPC Software Stack for cluster management and monitoring.

The Slurm Workload Manager

The open-source job scheduler for Linux and Unix-like kernels. It provides three key functions. First, it allocates exclusive and/or nonexclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending jobs.

Analysers, Profilers and Debuggers

  • Intel Parallel Studio XE Cluster Edition contains VTune Amplifier (advanced profiling capabilities with a single, friendly analysis interface; there are powerful tools to tune OpenCL and the GPU’s
  • Intel Inspector, an easy-to-use memory and threading error debugger for parallel and distributed memory C, C++ and Fortran applications that run on Windows and Linux
  • Arm/Allinea DDT enterprise debugging software and Allinea’s Performance Report.