I have been assisting in a remote workshop for the next scientists in Biology, CCP4 Crystallographic School in South Africa remotely taught from Diamond Light Source.

I work at the Electron Microscope Unit, at the University of Cape Town. This uses electron microscopes, which are different instruments to the Xray microscopy gear below.

X-Ray facilities require Synchrotrons which are expensive enough to be national facilities. We are using the UK national facility at Harwell. It was formerly the Atomic Energy Research Establishment, which was created after the Second World War on the site of RAF Harwell.

The course has some of today’s pioneers of computer software to process the data they capture. The Speaker’s page tells you more.

Most of this software is free to educational users, and charges a licence fee for commercial use. However, large parts of the pipeline use free software, and runs on Linux, and run python.

In fact, the extraordinary amount of processing done and its complexity means that any one software product is only a tiny piece of the chain, and they all rest on optimised matrix libraries written 50 years ago.

Numpy is the collection of library routines for basic mathematical operations, like matrix multiplication and fourier transforms.

Accelerator libraries like CUDA offload to independently operating compute kernels running on thousands of processors inside video card GPU chips, diverting their gaming capability to a few selected array operations.

CCP4 is is an orchestrator of a frightening number of underlying tools. These are usually command-line driven, with long parameter lists of files and switches, and may use the resources of GPU acceleration or even job submission onto supercomputers.

CCP4 throws a GUI and menus on top of these.

It makes it a large piece of software in its own right, and makes the underlying tools accessible to the researcher. It handles all the data dependencies, keeps track of the progress of long-running jobs, and allows the underlying programs to just do what they are good at.

What is the science ?

In the biological sphere, these tools allow the boffins to observe the positions of atoms inside what is a complicated protein, and refine our modelling software to be able to predict the actions of drugs and viruses at the level of proteins inside the cell.

It allows the extraordinary accomplishments in medicine that has become so obvious in the past year.

How does it work ?

X-Ray crystallography uses the following :-

  • A Synchrotron light source
  • a precision sample holder, that can accurately change the position or angle of the sample,
  • a photon-counting pixel-based detector to pick up the reflected X-Rays.

Follow that link for the straight scoop :) - the following is the shorter version ..

The computers can do a great deal after this step, but the quality of the initial data is paramount - garbage in, garbage out.

What is the sample ?

It will be a crystal, as large as possible, of a protein, or a combination of proteins locked together at bond sites or hinge sites, that takes part in biological interactions.

The crystal structure is forced on the protein by cooling it well below its operational range and growing a crystal. A crystal is extremely helpful as it allows averaging of all the identical elements of the crystal, giving better accuracy.

Some of the lectures are dedicated to growing and care for your crystal.

Where does the beam go ?

The X-Ray beam arrives to illuminate the whole crystal, in a coherent, planar wave. The rays bounce elastically off the electron clouds, to arrive somewhere on the detector.

What are the computer programs for ?

Let us first start with the problem description. The main thing is that there has been no focusing step - so the work usually done by a focussing lens has to be done in software.

Let us take an example - looking through a bubble-glass bathroom window at a moving shape.

If the bubbles on the glass are regular, a computer can figure out which bit of each bubble is coming from, and can post-process the result to change the focus - and restore the image, even in 3D. It is actually better than plain glass, because you can look at different angles.

So, from a knowledge of the glass, we can re-create the image.

This is an only slightly different problem - we put a super-high brilliant spot behind the glass, and we examine the glass itself.

We fix one parameter, the light source, to as close to ideal single spot, and determine the crystal structure that the electron beam senses as it passes through.

The super-cooled crystal is rotated 360 degrees and takes damage in the process. The resulting data is often 3600 samples, for a tenth degree rotation, for a full circle.

It is treated as a movie, which it is, and even if the sample moves during rotation we have got programs for that. We only see shadows at the detector, and some information is totally missing.

The dark spots are magnitude-only values of the fourier transform of the sample. The phase information is missing, or must be reconstructed. The detector image must get its reverse fourier transform and the phase determines positional accuracy at the sample.

Some of the heavy lifting is done by programs fitting known molecules together at precise angles and trying to fit them under the 3D cloak of electrons we see. No bits poking out, and no empty socks. We have a good idea what we are looking at, so we have to screen out contaminants, and countless other effects. The crystal repeat is vital for this.

Other programs are dedicated to single tasks, like matching known libraries of molecular data, or phase reconstruction, or false image detection, under the umbrella of CCP4, the pipeline manager.