UK National HPC Service

Computer Services for Academic Research Logo
Home | Helpdesk | Machine Status | Search | Apply
 

Application Support

Detailed below are examples of application support work carried out by the CSAR services on behalf of users. If you have any problems using the applications available on the CSAR machines, please contact the helpdesk in the first instance.

Dr Keith Refson - CLRC Rutherford Appleton Lab

The Castep Development Group (CDG) are responsible for the development of Castep in terms of optimisation, new algorithms and porting to new platforms.

In testing the altix port of Castep Keith Refson discovered that the code appeared to hang within a certain part of the code.

CSAR provided support in discovering the cause of the hang with use of the totalview debugger. The hang was not discovered to be a hang but indeed a failure of a lapack routine to converge. The input values to the routine were discovered to contain a number of erroneous values, these errors were traced back to its source and was found to be an error in the compiler. The compiler had incorrectly optimized certain loops which led to 1 value not making it into an array this error propagated through the code until the point where it became evident because of the lack of convergence.

Professor Hayhurst & Dr Vakili-Tahami - UMIST

Dr Vakili-Tahami had a parallel finite-element code that was running on a local SUN machine and they were interested in evaluating the potential of running the code on CSAR machines. The code had large memory requirements. We provided the assistance required to port the code to Fermat and helped them to make efficient use of the available memory.

Professor Coveney (Dr Maziar) - Queen Mary, University of London

Dr Maziar Nekovee has been studying Lattice-Boltzmann methods, and he'd been using a serial code which he wanted to make parallel in order to study large systems, using massively parallel computers.

After initially spending some time looking at his code he came to Manchester. He spent two days here completing the parallelisation with help from the CSAR staff. He left with a nearly completed parallelisation and much information about tricks for optimisation of codes and use of profiling tools on T3E. The parallel version he developed was based on decomposition of the physical domain into slices. Subsequently, a more general version of the parallel code was developed by the Queen Mary group.

Prof Coveney (Dr Nelido Gonzalez) - Queen Mary, University of London

CSAR agreed to parallelize an existing serial hydrodynamic lattice gas code on behalf of Professor Coveney, Queen Mary, University of London. The code describes the dynamics of ternary amphiphilic fluids including long-range interactions of arbitrary range (in order to tease out effects due to interfacial stiffness as well as surface tension). Estimates of time and ease of conversion were based upon a review of an earlier version of the code 'lgas'. The code structure involved propagation of the fluid microscopic local states (particle population, fluid species and surfactant orientational angle) around a two-dimensional triangular lattice, transformed into two square grids for memory storage purposes. Local states are encoded bitwise in this global integer array. After propagation they collide with other incoming states; the outgoing state is selected by Monte Carlo sampling among all possible mass and momentum-conserving ones by virtue of both appropriate look up tables and a non-flat distribution which favours species separation. A feature of the serial version was the incorporation of long-range interaction potentials realized through a global Fourier transform on the grid. The motivation for the parallel implementation was the need to simulate as large grid sizes as possible in order to avoid finite size effects, and a large number of systems with different initial random seeds to average measurements over.

The code was parallelised with the intention of making minimal impact on the structure and layout of the original serial version, while making use of a number of supporting software. This supporting software is enumerated as: The Cray Message Passing Toolkit (including an MPI implementation), FFTW - The MPI aware Fourier transform package, SPRNG (scalable parallel random number generator) from NCSA , and an in-house communication harness suite.

The resulting parallel code has thus far been most thoroughly tested on a 1x1 processor grid against the original serial version, confirming that the macroscopic state (ensemble average over microscopic states) of the system evolves in an identical manner, while exhibiting local microscopic differences due to the changes in the FFT and random number generator routines. Multiprocessor invocations of the code produce results which agree with single processor versions.

Prof Coveney - Queen Mary, University of London

Visualisation in Parallel - VIPAR. VIPAR was originally a 2 year EPSRC research project in carried out at Manchester Visualisation Centre in 1996/1997 by Steven Larkin. The primary goals of this project were to provide:

  • An automatic parallel module generator.
  • A dataflow network editor for building parallel visualisation modules.
  • A development environment for (remote) computational steering.

As a result of a request from Professor Coveney, the VIPAR project has been re-opened specifically to address some of the requirements of his CSAR Consortium. This includes the development of parallel isosurface and FFT modules, and in the longer term, to help in providing a route for computational steering.

This work is currently at an early stage and substantial further work is planned to determine the viability of VIPAR for these purposes, which is intended to run on the group's 16-node SGI Onyx2 system.

Dr J. Williams - Queen Mary & Westfield College

Optimisation of the CGLES Large-Eddy Simulation Code

We were asked to profile the code and look at ways of improving its efficincy. The CGLES code written in C with MPI, and was being run across 96 processors of the T3E.

This work was in two phases. The first involved optimising the code, the second looked at optimising the processor usage by reordering the allocation of grid points across the processors.

The most cpu-expensive parts of the code where the MultiGrid Pressure Solver: MGE_MatSolve() and the block-searching subroutine: bid_next(). The latter was quickly fixed by changing the search algorithm. The Multigrid Solver used a SOR bottom-level solver. Most of the work that we did on the code involved replacing this with a Conjugent Gradient Solver.

Another part of the code that was particularly slow was the I/O of the restart files. The code kept all its files in one directory: 2208 files per step. After talking to Cray, it was suggested that by using subdirectories, the time taken to create inodes would drop vastly if the number of entries per directory did not greatly exceed 1-2000 files.

Other modifications included replacing the posix-timers with the much lower overhead cray-specific ones.

The second phase looked at optimising the block-ordering. The object of this was to both load-balance the processors and also to reduce the inter-process communications. This was done by writing a patch to convert the .map file to a format readable by an implementation of the 'Greedy Algorithm' that Dr. Kidger implemented whilst he was a lecturer in Engineering. This code allocates 'elements' to processors such that the load per processor is balanced and also that the sub-domains are as convex (smallest surface area) as possible.

Several optimal different maps were produced, for 48 to 138 processors. All showed significant speed-ups over the original.

Page maintained by This page last updated: Monday, 23-Aug-2004 11:38:58 BST