News
New project will open the door to the secret world of bacteria
Published online: 12.10.2022
News
New project will open the door to the secret world of bacteria
Published online: 12.10.2022
New project will open the door to the secret world of bacteria
News
Published online: 12.10.2022
News
Published online: 12.10.2022
Bacteria play a crucial role in countless processes - from the production of food, medicine and biogas, to a direct impact on the climate.
Today, we have mapped the genetic material (genome) from approximately 50,000 different species of bacteria, but with an estimated 10 to 100 million species, there is great potential in mapping even more. The reason is that only if we know the bacteria's genetic material, can we make full use of them.
Researchers from the Department of Chemistry and Bioscience and the Department of Computer Science at Aalborg University have received DKK 15 million from the VILLUM FOUNDATION for the continuation of a project that will boost the mapping of the unknown bacteria. In the project, they will combine their expertise in biology, graph data and machine learning in the hope of revolutionizing the state of the art in this field.
At the head of the work with microbiology is Professor Mads Albertsen, who for a number of years has worked on gene sequencing and mapping bacteria in the Danish nature. One of the biggest challenges in DNA sequencing is finding out which bacteria a certain piece of DNA originates from. In the project, Mads Albertsen will therefore implement new measuring equipment that can identify special characteristics of the individual genomes. He explains:
- There is a huge untapped potential in nature, and our hypothesis is that with the new equipment we can separate DNA in new ways and thus make it easier to differentiate different species of bacteria. However, when we start using these methods, we generate so much data that we need advanced data science to extract all the value from it.
That is the reason why Mads Albertsen has teamed up with Professor Katja Hose and Professor MSO Thomas Dyhre Nielsen from the Department of Computer Science. They both have extensive experience in handling massive amounts of data.
Thomas Dyhre Nielsen explains that machine learning is the prerequisite for researchers to be able to identify potential new species based on the enormous amounts of biological data:
- We will utilize the information provided by biologists about how different DNA fragments are related to create a machine learning model that can, among other things, group the genetic material into clusters. The novelty is that we will create even better and more nuanced groupings based on the new characteristics identified by Mads and his team.
In the hunt for more bacteria, one of the cornerstones of the project will be to combine knowledge about existing bacteria with massive amounts of external data.
In addition to time and place, it can be information about what the weather was like when a specific soil sample was taken, characteristics of the environment around the test site or information from external knowledge and databases, ontologies etc.
Here, the researchers will utilize knowledge graphs and so-called data lakes, which makes it possible to connect heterogeneous data and find new connections. This is Katja Hose's specialty.
- If we have a map of Denmark and know where specific bacteria with special characteristics have been found, we can use that data to predict where we can expect to make other interesting discoveries. In other words, we will develop methods to explore "the dark spots".
In the long run, the researchers hope that their new methods will form the basis of a complete database containing one genome per species. An important element will be to ensure that not only experts can make use of the generated data.
- We must be able to explain how and why a system comes up with certain answers - which data has been used, where it comes from and how it has been handled. If users are to trust a system, simply providing a black box is usually not enough. In addition, we must not forget that data and knowledge develop over time - and a system must also take this into account, says Katja Hose.
The project will run for the next five years, and since the researchers have already collected over 10,000 samples from all over Denmark under the auspices of the project MicroFlora Danica, the tracks have been laid according to Mads Albertsen:
- Now, we have to boost the development of new methods that will bring us much closer to a complete genome database, which is the basis for almost all research involving bacteria.
The VILLUM Synergy project "Illuminating microbial dark matter through data science (DarkScience)" is financed by the VILLUM FOUNDATION with DKK 15.5 million.
It builds on the projects "Data Science meets Microbial Dark Matter" and MicroFlora Danica.
Read the scientific article: Metagenomic Binning with Assembly Graph Embeddings, In: Bioinformatics, 2022, by: A. Lamurias, M. Sereika, M. Albertsen, K. Hose, T. D. Nielsen
Follow the project here: darkmatter.aau.dk
Professor Mads Albertsen
Department of Chemistry and Bioscience
Aalborg University
Phone: 2293 2191
Email: ma@bio.aau.dk
Professor Katja Hose
Department of Computer Science
Aalborg University
Phone: 9940 8886
Email: khose@cs.aau.dk
Professor MSO Thomas Dyhre Nielsen
Department of Computer Science
Aalborg University
Phone: 2980 9026
Email: tdn@cs.aau.dk
PRESS CONTACT
Nina Hermansen
Mail: ninah@cs.aau.dk
Phone: 2294 0459
Niels Krogh Søndergaard
Email: nks@bio.aau.dk
Phone: 3166 0080