Other researchers say that restrictions at the largest SARS-CoV-2 genome platform encourage fast sharing while protecting data providers’ rights.
Hundreds of scientists are urging that SARS-CoV-2 genome data should be shared more openly to help analyse how viral variants are spreading around the world.
Researchers have posted huge numbers of SARS-CoV-2 genome sequences online since January 2020. The most popular data-sharing platform, called GISAID, now hosts more than 450,000 viral genomes; Soumya Swaminathan, the chief scientist at the World Health Organization (WHO), has called it a “game changer” in the pandemic. But it doesn’t allow sequences to be reshared publicly, which is hampering efforts to understand the coronavirus and the rapid rise of new variants, argues Rolf Apweiler, co-director of the European Bioinformatics Institute (EBI) near Cambridge, UK, which hosts its own large genome database that includes SARS-CoV-2 sequences.
“The openness of SARS-CoV-2 sequence data is crucial for the rapid response against the biggest health threat to humankind in a very, very long time,” says Apweiler.
In a letter released on 29 January, Apweiler and others call for researchers to post their genome data in one of a triad of databases that don’t place any restrictions on data redistribution: the US GenBank, the EBI’s European Nucleotide Archive (ENA) and the DNA Data Bank of Japan, which are collectively known as the International Nucleotide Sequence Database Collaboration (INSDC).
Anyone can anonymously access the INSDC’s data and use them as they want, but GISAID requires that users confirm their identity and agree not to republish the site’s genomes without permission from the data provider. This means that studies building on GISAID data — such as those that create evolutionary trees analysing how SARS-CoV-2 variants are related — can’t publish full data so that others can easily check their analyses or further build on their data set. They must direct readers back to the GISAID site.
The letter says the scientific community should “remove barriers that restrain effective data sharing”, but doesn’t mention GISAID specifically. It is signed by more than 500 scientists, including the 2020 chemistry Nobel laureate Emmanuelle Charpentier, and the head of the COVID-19 Genomics UK Consortium, Sharon Peacock. Where scientists have already established submissions to other databases, the letter states, “these submissions should continue in parallel”.
Feature not flaw
Many researchers who work with GISAID say that its terms of access are a benefit, because they encourage hesitant researchers to share data online speedily, without fear that others will use the results without credit. “The reason so many labs have provided SARS-CoV-2 genomes to GISAID is precisely because of the data-access agreement that restricts public resharing,” says Sebastian Maurer-Stroh, a bioinformatician at Singapore’s Agency for Science, Technology and Research. GISAID has worked with many labs to assist them to share data, he says.
GISAID stands for the Global Initiative on Sharing Avian Influenza Data; an international consortium of researchers helped to set it up as a non-profit foundation in 2008, to address researchers’ reluctance to share data on influenza strains. Some nations, including Indonesia, a hotspot for avian flu, feared that pharmaceutical firms would create drugs and vaccines using the sequence data without crediting the original data providers or sharing the benefits of the work with them. But they were persuaded to share sequences rapidly on GISAID; in March 2013, for instance, China published sequences of H7N9 avian flu in the database on the same day it informed the WHO of three infections in people. “GISAID encourages and incentivizes real-time data sharing by parties who would otherwise be reluctant to share, by ensuring that they retain their rights in their data,” says a spokesperson for the initiative.
Read more at: https://www.nature.com/articles/d41586-021-00305-7