"We have to ask ourselves what to do with hacked data"

In recent years, there have been several incidents of hacked databases, and the hackers published the stolen data on the internet. Are scientists allowed to use such data for their research? ETH Zurich bioethicists Marcello Ienca and Professor Effy Vayena have addressed this question in a paper published in the journal Nature Machine Intelligence. ETH News spoke with Ienca.
Biotethicist Marcello Ienca. (Photograph: private)

ETH News: Are scientists allowed to analyse hacked data published on the Internet, from a purely legal point of view?
Marcello Ienca: The short answer is yes – at least if the scientists themselves are not responsible for the hacking, because hacking itself is a felony. However, if anonymous hackers upload data on a public repository, it is public data. The answer is less clear when scientists aim to download this data onto their own computers. In some jurisdictions, this could be construed as possession of stolen property. However, I am not aware of any case where scientists have been charged for this.

And from an ethical perspective? Are scientists allowed to do everything that is legally permitted, i.e. also analyse data that was originally stolen from someone?
No, because science is not an activity like any other, but a cultural practice that is entitled to high standards of integrity. We scientists have a social responsibility that we must live up to. Therefore, the mere prospect of new knowledge is not enough to conduct a study. Science must be practised carefully and responsibly.

In your work, you give some examples of analyses of hacked data: Detailed data from users of the dating platform Ashley Madison, and Afghanistan and Iraq war documents from the US military. Researchers have analysed this data in recent years. Was this ethical?
In some cases it may have been justifiable, in others less so. In our study, Effy Vayena and I make the point that scientific research with hacked data is always morally problematic. However, in very specific circumstances and under very specific conditions, such analysis might be ethically acceptable. In our study we mention six conditions:

  • The data source and the background of the data collection must be made transparent. This should also prevent other scientists from feeling impelled to steal data themselves.
  • Privacy and data protection must be respected. Scientists must anonymise the data, even if personal data is publicly accessible as part of a hack.
  • The research project must not put any data subject at greater risk than they may already be as a result of the hacking.
  • The research project must have a high scientific and social benefit.
  • The research goal can only be achieved by using the hacked data and not otherwise.
  • And finally: Every research project that uses hacked data should have to be approved by an ethics committee.

Are there actually limits to the use of data from morally reprehensible sources? Would it be ethical to reuse results of medical experiments on prisoners in Nazi concentration camps today in order to re-analyse them?
There is a clear consensus in science that no unethical studies should be done. And these experiments were unethical, there is no doubt about that. More difficult is the question of what, if anything, other scientists are allowed to do with results if the studies have already been done. Because in this case, the scientists are not involved in the unethical behaviour. In principle, the above criteria can also be applied to other cases of data obtained in a morally problematic or clearly reprehensible manner.

Back to publicly available hacked data. Today, research with publicly available data does not usually require the approval of an ethics committee. Would there need to be a change?
Yes, in our view, research with hacked data is not the same as other research that uses public data. It should not be the task of individual researchers to determine whether a research project with hacked data fulfils the first five conditions mentioned. This evaluation should be done by a professional ethics committee. One could include this point in the human research legislation.

What else remains to be done?
With our work we would like to spark a debate in the scientific community. We hope that this discussion will lead to a consensus. And we will try to formulate ethical guidelines and recommendations for dealing with hacked data together with international organisations and professional and academic societies. We assume that with the increase in cybercrime, the amount of publicly accessible hacked data will also increase. We as scientists have to ask ourselves what to do with such data.

More information

Marcello Ienca was until recently a Senior Researcher in the group of Effy Vayena, Professor of Bioethics at ETH Zurich, and he is now a group leader at EPFL. One focus of his research is the ethics and governance of biomedical data, artificial intelligence and emerging technologies at the human-machine interface.

Reference

Ienca M, Vayena E: Ethical requirements for responsible research with hacked data, Nature Machine Intelligence 2021, 3: 744, doi: 10.1038/s42256-021-00389-w