The librarian of the petabytes

It is necessary to prepare now for the planned upgrade of the Swiss Light Source SLS. In order to do justice to future research, Alun Ashton is estimating the amount of data that future experiments will produce.
Alun Ashton, interim leader of the sub-project Controls & Science IT in a server room at PSI. (Photo: Paul Scherrer Institute/Mahir Dzambegovic)

Alun Ashton is certain that the future of research will bring larger amounts of data. He once worked as a scientist himself but switched to data analysis and finally to science IT many years ago. At PSI he now heads the Science IT group; in addition, he is interim head of the Controls & Science IT sub-project within SLS 2.0.

This December, the Swiss Parliament will vote on the financing of PSI's SLS 2.0 upgrade project. If the decision is positive, the execution phase – the actual upgrade of the Swiss Light Source SLS – would then begin in 2021. However, the planning phase is already under way to ensure, among other things, that the parliament has a solid project plan and is well informed about what it is voting on.

The work of Alun Ashton and his group is part of these preparations. And he points out: "Everything we are currently doing is relevant both for SLS 2.0 and for SwissFEL, which was completed in 2016," referring to PSI's X-ray free-electron laser.

New components, new experiments and much more data

A good number of PSI employees are already investing some of their time in preparations for SLS 2.0. In the group of Markus Jörg, head of the SLS 2.0 sub-project Infrastructure and Logistics, 3D computer models of the new components are being put in their future place. Meanwhile, Alun Ashton is estimating the petabytes of data that the new experiments will be able to deliver: "Because SLS will have a much more intense beam after the upgrade, the experiments can deliver significantly more data per unit of time."

And while Philip Willmott, scientific coordinator and head of the Photonics Science Programme sub-project, has had the final concept report in his hands since summer 2020, which shows for each beamline how the upgrade will be implemented at the individual experiment stations, Alun Ashton is planning how these future data sets can realistically be stored. After all: "Data storage is an important part of science. If you include this in the concept from the very beginning, you will benefit all along the line."

Reduce and compress

A key issue for Ashton is data reduction and compression. To understand data reduction, one can imagine a portrait photo of a person. From an artistic point of view, the background may also be interesting. But if we are only interested in the facial features of the person depicted, all the bits and bytes needed to represent the background are not needed.

It' s a similar situation for scientific experiments that yield very large data sets. The parts that are definitely not needed should not be stored, says Alun Ashton: "Data storage is a relevant cost factor. For example, if we manage to use only a tenth of the storage space, we save a lot of money." To ensure that data reduction is successful and makes sense, Ashton and his team are also investigating machine learning algorithms, that is, the use of artificial intelligence to help automate the process and improve the quality of the data that gets stored.

The right bike for every situation

In addition, Ashton and his team also rely on data compression: "We develop special hardware that can perform this data compression. Among other things, we have a successful cooperation with IBM for this purpose," reports Ashton. The team is planning several solutions so that different data sets can be processed in the best possible way. Ashton offers a comparison to explain the situation: "I have three bicycles: a mountain bike for uneven terrain, a racing bike that lets me cover long distances quickly, and an e-bike that can save me some struggle. Similarly, experiments are also different, and it makes sense to have the appropriate hardware and algorithm ready in each case."

If this ultimately results in less, but still large amounts of data being stored, another issue that also falls within Ashton's scope comes into play: open access for data. "In the PSI guidelines, we have committed ourselves to eventually making the data from our experiments publicly accessible. For example, other research groups can access it and discover something else in it." This will also be about "findability": As in a library, data collection requires a kind of catalogue so that third parties can find what they are looking for.

Progress in collaborative exchange

On all of these topics, Ashton and his team work collaborate with data centres in Switzerland: the Swiss National Supercomputing Centre (Centro Svizzero di Calcolo Scientifico, CSCS) and the Swiss Data Science Centre of ETH Zurich and EPFL. In addition, they are involved in international exchanges: "We are working with other large research facilities in Europe, especially the European Spallation Source ESS in southern Sweden and the Diamond Light Source in England," Ashton explains. He himself, a native of Wales, worked at the Diamond Light Source before joining PSI. "We are getting the maximum out of these collaborations; we want to learn as much as possible from each other's experience." Reciprocally, PSI also passes on its own findings to the collaboration partners.

In all this, what Ashton has always kept in mind is the special opportunity that is now open at SLS: "If you rebuild a large research facility in the way we are planning to upgrade SLS, this is an occasion that comes perhaps only once in 20 years." It is obviously much more difficult to change practices once the facility is operational. That is why Ashton and his team are now making fundamental decisions about how people will work and conduct research after the upgrade. "If we get it right now, PSI researchers will benefit greatly in the decades to come. We can pave the way for them to save a lot of time and effort."

Text: Paul Scherrer Institute/Laura Hennemann