Anyone using large research facilities such as the Swiss Light Source SLS and the X-ray free-electron laser SwissFEL to investigate molecular structures has to process vast amounts of data. The measurement of a single protein produces 250 terabytes. Stored on blank DVDs, this would make a stack as high as the Leaning Tower of Pisa.
Research on the cytoskeleton involves measuring a whole series of proteins, protein complexes and other biomolecules, taking and comparing pictures of their structures and observing molecular interactions. To make progress and maintain an overview, modern forms of data analysis, including those using artificial intelligence, have become indispensable. For example, G.V. Shivashankar, head of the PSI Laboratory for Nanoscale Biology and professor of Mechano-Genomics at ETH Zurich, uses these methods in his research.
He is investigating, among other things, an important property of the cytoskeleton: its rigidity. As humans age, this multifunctional support structure of the cell becomes less flexible and dynamic, making it easier for pathogens to do their harmful work. In less dynamic cells, they are better able to intervene in the cell’s signal pathways and have an easier time multiplying. “This may be the reason why older people are more likely to become seriously ill from a Covid-19 infection,” the researcher says.
The cytoskeleton has a major influence on the shape of the cell nucleus and on how well the genetic material is packed into it. Spread out and put together, the molecular chains of the DNA would be more than a metre long, but they are so tightly and cleverly wound into a ball that they fit into the tiny, ten-micrometre cell nucleus. If the cytoskeleton becomes more rigid, this packaging no longer functions optimally and the individual genes can no longer be read as effectively to produce proteins the body needs, for example for metabolism or signal transmission.
And this is where modern imaging could bring a breakthrough: “We already know a few hundred active agents that target the signalling pathways of the cell,” says Shivashankar. “It is just unclear what combination and dose is best to counteract the rigidity of the cells and the corresponding restricted signal transmission.” His team wants to find out, by adding the active agents to cultures of infected cells in the Petri dish and then observing, in high-resolution, what happens. “We need a screening of all known drug candidates. And PSI has the necessary infrastructure to carry out something like this – SLS in particular is very well suited for this task.”
The roots of many diseases
There is one reason this research is especially important: it is now assumed that an abnormal packaging of the genetic material in the cell nucleus plays a major role in cancer as well as in neurodegenerative diseases such as Alzheimer’s. Shivashankar’s lab is working on a process that routinely makes images of cell nuclei to determine, by examining various characteristics, how the DNA is packed. This allows predictions to be made as to which genes cannot be read, resulting in particular diseases. That would be much simpler and less expensive than sequencing the genes on an individual basis to achieve the same result.
The challenge here is that the characteristics which need to be analysed and compared are extremely diverse. Without powerful computers and algorithms that can compare hundreds of characteristics on thousands of images, this could not be accomplished. Artificial intelligence reliably detects subtle differences in the type of DNA packaging and recognises correlations with cell malfunctions. So Shivashankar’s team is cooperating with experts in machine learning – a group led by statistician Caroline Uhler, professor at the Massachusetts Institute of Technology in the USA. “The advantage of using machine learning is that it can help us identify novel features which may not be directly interpretable by humans but automatically give a strong indication of cell health or disease,” Uhler says.
Powerful computers are indispensable
Advances in machine learning are having an enormous impact in all areas where there is explosive growth in the volume of data. And one reason the amount of data is so large is that the researchers really would like to look at each cell individually to identify diseases. “Even cells of the same type can have very different structures and thus behave differently,” says G.V. Shivashankar. “It’s like trying to examine and understand each individual grain of sand on the beach.” Fed ever more examples, the computer learns over time which cell structure leads to which behaviour and recognises patterns.
Ultimately it might be possible to make statements – solely on the basis of high-resolution imaging of a cell nucleus as a kind of biomarker – about how well a cell is functioning, what diseases the person concerned might suffer from, and what type of therapy holds the most promise for success, making early, targeted interventions possible. In any case, the method would be an enormous boost for diagnostics. “To exploit the enormous potential of machine learning for biological discovery and medical diagnostics, however, it is crucially important to carefully evaluate the identified cytoskeletal and nuclear biomarkers in the clinical environment,” Uhler says.
With the Centre for Proton Therapy at PSI, Shivashankar’s group is also investigating whether high-resolution images of blood cells, the cytoskeleton and the cell nucleus could provide indications of therapeutic efficacy. “We compare images from before, during and after treatment of cancer patients and check for correlations between the changes we see and the progress of the therapy,” Shivashankar says. Here too, it is important to reliably and quickly recognise possible irregularities within an enormous amount of image data. “Anyone still working on such tasks nowadays without machine learning,” Shivashankar says, “is missing out.”