UK National HPC Service

Computer Services for Academic Research Logo
Home | Helpdesk | Machine Status | Search | Apply
 

Optimisation Support

Detailed below are examples of optimisation work carried out by the CSAR services on behalf of users. If you require assistance optimising your code please contact the helpdesk in the first instance.

Dr. Glenn Carver, Senior Research Associate in the Centre for Atmospheric Science at Cambridge University.

Tomcat is a tropospheric chemistry code modelling the atmosphere from the ground level up to the tropopause (about 10km altitude). The serial code was highly productive on the Fujitsu during its time at CSAR, after the removal of fuji the code was ported to Green and into an MPI code with assistance from CSAR staff. Further work continued including optimisation of halo exchanges in the advection routines, this work is described in a little more detail.

The size of the halos required in a latitude circle is related to its proximity to the poles. This fact leads to an imbalance which has severe effects on every processor because of the communication algorithm chosen and not just polar regions.

Work was done to improve the way data was prepared for the halo exchange and to create algorithms for the order in which halos were exchanged. In simplifying the algorithm it was also found to measurably speed up data preparation, the new methods for halo exchanges used the knowledge that there was a load imbalance to allow those processors that were ready to continue into the rest of the code without significant delay, leading to a significant improvement in scalability.

CONQUEST

CONQUEST - The UKCP consortium's CONQUEST code is a simulation code for doing ab initio calculations (quantum chemistry) on very large systems containing thousands of atoms. Although CONQUEST was designed from the outset to scale to large numbers of processors on the Cray-T3E, the single node performance, as is so often the case, was a disappointing fraction of the theoretical peak.

CSAR staff were engaged to optimise the computational kernel, which involves the multiplication of every small (typically 4 by 4) matrix in one list with every small matrix in another list. On a set of representative test cases, in which the problem size is too large to be cache-resident, the original fortran code achieved performances in the range 33-55 Mflops. Using techniques such as hand-unrolling do loops, software pipelining, thoughtful selection of compiler options, and some minor code restructuring, CSAR delivered a fortran version that performed at 150 Mflops on the same test suite. CSAR also delivered an assembly language version, specific to the 4 by 4 case, which achieved 170 Mflops.

CSAR also provided valuable advice in the form a strategy to reduce the memory requirements associated with communications under MPI 1.2. This strategy enabled CONQUEST to scale to very large problems without sacrificing portability.

From Prof Gillan:

... the work you did for us is bearing fruit. All the work we were doing with you is now almost in place in the Conquest code, and will certainly have a terrific effect.

CASTEP

CASTEP - The work of parallelisation and optimisation of the package CASTEP, carried out on behalf of UKCP is neatly summed up by Dr Phil Lindan, co-ordinator of UKCP: 'A vast amount of new code had to be dealt with in parallelising version 4.2 [of CASTEP], it was almost like starting from scratch. CSAR's support has been, and continues to be, absolutely vital to the success of this project. We can now attack larger and more difficult problems than ever before, and we are sure that many great scientific stories will emerge thanks to these new capabilities.'

Cambridge UK, 12 June 2000. CASTEP, a leading quantum mechanics program from Molecular Simulations Inc. (MSI), can now run 50 to 100 times faster than before on designated computer architectures. This makes it possible to model larger molecular and solid-state systems, helping researchers to investigate heterogeneous catalysis, thin film growth, and chemical vapor deposition, as well as the electronic and structural properties of bulk materials.

This dramatic speed-up is due to the vastly improved scaling capabilities of the new MPI version of CASTEP, which is very close to linear. This means that the addition of more computer processors leads to a corresponding increase in modeling power.

CASTEP is now capable of modeling systems such as large drug related molecules, the interaction of large organic molecules with catalytic surfaces, and physical properties of complex oxides. This new power will be particularly important to researchers in the electronics and petrochemicals industries studying chemical vapor deposition (CVD), thin film growth, and catalysis.

The CASTEP 4.2 release was achieved through a collaborative world-wide effort between Molecular Simulations Inc. (MSI), Fujitsu, SGI, the UKCP Consortium, Daresbury Laboratory, and CSAR, a high-performance computing service based at the University of Manchester.

Page maintained by This page last updated: Monday, 23-Aug-2004 11:38:59 BST