This ERC aims to create a search engine of all sequencing data that has been produced by biology labs worldwide. The data exists and currently resides in a public repository called Sequence Read Archive, hosted jointly at two large biological institutes in Europe and USA: NCBI and EBI. However this mass of data is relatively inaccessible, as it is too big to be downloaded by any single lab. It consists of tens of petabytes of sequences, from millions of organisms, and of course lots of human sequences.
The project run by Rayan Chikhi aims to solve this accessibility problem, using new data structures and compression schemes, enabling global biological analyses to be performed over all of this data in short time. Such a global analysis has been recently performed, at a great time and computational cost, which enabled the discovery of new coronavirus species.
Read more here. Congratulations!