logo  

homepage
 
Research Background Publications Contact
 

Research: Bioinformatics


The discovery of DNA has been one of the biggest catalysts in genomic research. Sequencing has enabled us to access the wealth of information encoded in DNA and has provided the basis for ground-breaking achievements such as the first complete human genome sequence. Furthermore, it has tremendously advanced our understanding of life-threatening genetic disorders and bacterial and viral infections. With the recent advent of next generation sequencing (NGS) technologies, sequencing became accessible to the majority of researchers and made metagenomic sequencing widely available. However, to realise its true potential, sophisticated and tailor-made bioinformatic programs are essential to translate the collected data into meaningful information. My research interests include:


Medical and Populations Genetics

A major focus of my current research is the human microbiome and its role in health and disease. Bacterial cells in and on our body outnumber human cells by a ration of 10:1. These communities of bacteria and other microbes play an important role in the functioning of the digestive tract, immune system, skin, as well as other body systems and have a major impact on our well-being. Low-cost sequencing technolgies have allowed us to collect vast amounts of information on these microbial communities but our ability to translate this data into meaningful information has lagged behind and sophisticated mathematical models and bioinformatic tools are required. A better understanding of the human microbiome will facilitate the identification of factors associated with health and disease as well as predicting relapses and patient susceptibility to different treatment strategies.


Metagenomics and Metatranscriptomics

The study of microbial biodiversity is still in its infancy and metagenomics and metatranscriptomics will reveal new insight into the microbial world. In the past, cultivable bacteria were studied with the help of PCR, but it is estimated that 99% of all bacteria are not cultivable. Thus the vast majority is still unexplored since classical methods cannot be applied. Metagenomics provides a new approach to this problem. The recent advances in sequencing technologies have produced huge metagenomic data sets that can reveal insight into the genetic potential of microbial organisms. Furthermore, metatranscriptomics will allow us to study the functional activity of a populations and will help us to understand how communities react to environmental changes.


Sequencing Error Profiles

In recent years, Illumina has emerged as the global market leader in DNA sequencing. However, biases and errors associated with this high-throughput sequencing technology are still poorly understood which has precluded the development of effective noise removal algorithms. In addition, many programs were not designed for Illumina data or metagenomic sequencing. Therefore, a better understanding of the idiosyncrasies encountered in Illumina data is essential and programs must be tested and benchmarked on realistic and reliable in silico data sets to reveal not only their true capacities but also their limitations. I conducted the largest in vivo study of Illumina error profiles in combination with state-of-the-art library preparation methods to date. For the first time, a direct connection between experimental design factors and systematic errors was established, providing detailed insight into the nature of Illumina errors. Further, I tested various error removal techniques enabling researchers to choose optimal processing strategies for their particular data sets. In addition, I devised several simulation tools that accurately reflect artificial and natural fine-scale variation. This includes a flexible and efficient read simulation program which is the only program that can directly reflect the impact of experimental design factors.


Viral Haplotype Reconstruction

Viral haplotype reconstruction from a set of observed reads is one of the most challenging problems in bioinformatics today. Next-generation sequencing (NGS) technologies enable us to detect single nucleotide polymorphisms (SNPs) of haplotypes - even if the haplotypes appear at low frequencies. However, there are two major problems. First, we need to distinguish real SNPs from sequencing errors. Second, we need to determine which SNPs occur on the same haplotype, which cannot be inferred from the reads if the distance between SNPs on a haplotype exceeds the read length. We conducted the first independent benchmarking study that directly compares the currently available viral haplotype reconstruction programs. Furthermore, we are exploring the potential of third generation sequencing technologies for viral haplotype reconstruction.