View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Hal-Diderot The new science of metagenomics and the challenges of its use in both developed and developing countries Edi Prifti, Jean-Daniel Zucker To cite this version: Edi Prifti, Jean-Daniel Zucker. The new science of metagenomics and the challenges of its use in both developed and developing countries. 2013. <hal-00821359> HAL Id: hal-00821359 https://hal.inria.fr/hal-00821359 Submitted on 9 May 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´ee au d´epˆot et `a la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publi´es ou non, lished or not. The documents may come from ´emanant des ´etablissements d’enseignement et de teaching and research institutions in France or recherche fran¸cais ou ´etrangers, des laboratoires abroad, or from public or private research centers. publics ou priv´es. The new science of metagenomics and the challenges of its use in both developed and developing countries Edi Prifti1 & Jean-Daniel Zucker2,3,4 1Institut National de la Recherche Agronomique, US 1367 MGP, 78350 Jouy-en-Josas, France; 2Institut de Recherche pour le Développement, UMI 209, Unité de modélisation mathématique et informatique des Systèmes Complexes, 93143, Bondy, France; 3Institut National de la Santé et de la Recherche Médicale, U 872, Nutriomique, Équipe 7, Centre de Recherches des Cordeliers, 75006, Paris, France; 4Vietnam National University, Equipe MSI, Institut de la Francophonie pour l'Informatique, Hanoi, Vietnam; Abstract Our view of the microbial world and its impact on human health is changing radically with the ability to sequence uncultured or unculturable microbes sampled directly from their habitats, ability made possible by fast and cheap next generation sequencing technologies. Such recent developments represents a paradigmatic shift in the analysis of habitat biodiversity, be it the human, soil or ocean microbiome. We review here some research examples and results that indicate the importance of the microbiome in our lives and then discus some of the challenges faced by metagenomic experiments and the subsequent analysis of the generated data. We then analyze the economic and social impact on genomic-medicine and research in both developing and developed countries. We support the idea that there are significant benefits in building capacities for developing high-level scientific research in metagenomics in developing countries. Indeed, the notion that developing countries should wait for developed countries to make advances in science and technology that they later import at great cost has recently been challenged. Introduction Our view of the microbial world and its impact on our lives is rapidly changing. Until recently we have considered ourselves as largely independent from the microbial ecosystem we live in (Blaser 2006; Ley, Lozupone et al. 2008; Davies 2009). The mainstream thinking that we would be healthier when staying away from microbes is now debated (Bloomfield, Stanwell- Smith et al. 2006); hygiene and antibiotics excess during childhood may be associated with allergies and asthma during adulthood (Hanski, von Hertzen et al. 2012; Kawamoto, Tran et al. 2012; Russell, Gold et al. 2012). 1 Bacteria, one of the first organisms on Earth (present for more than 3 billion years), have been evolving and adapting to all sorts of environments ever since, elaborating a large genetic pool that codes for many biological pathways that perform a plethora of functions (Canganella and Wiegel 2011). In the interconnected web of life where microbes are most abundant (Whitman, Coleman et al. 1998), humans, like other multicellular organisms, have evolved to live in equilibrium and in symbiosis with them. Indeed, our genome does not code for all the biological functions needed for our survival, or to take full advantage of the environment we live in. We interact considerably with our microbiota and as a consequence the health of this ecosystem is tightly linked to ours. Most abundant in the intestine, the gut microbiota is now considered to be an organ reaching approximately 2kg in mass (Baquero and Nombela 2012). With 150 times more genes than our own genome, the collective genome of our microbiome (also called “our other genome” or metagenome) codes for many different functions that are not undertaken by our cells. For instance, gut bacteria can protect us by producing anti-inflammatory factors, antioxidants and vitamins, but also harm us by producing toxins that mutate DNA, or affecting the nervous and immune systems. The outcome of microbiome deregulation may take the form of various chronic diseases, including obesity, diabetes and even cancers (Zhao 2010). The very nature of human identity is now being questioned and an increasing number of scientists believe that we are indeed a super-organism with a microbial majority: 10 times more microbes than human cells, that should be taken into consideration as part of us (Blaser 2006; Gill, Pop et al. 2006; Davies 2009). Gene therapy was developed at the beginning of the twenty-first century and came with the promise of revolutionizing medicine, but its implementation was more challenging than anticipated. Such difficulties were partially due to the multifactorial nature of most diseases but also to the complex implementation and success rate of such therapy. The gut microbiota opens new means of intervention in curing complex diseases linked to it, such as the use of probiotics, fecal transplantation or other microbiome targeted approaches (Borody and Khoruts 2012; Lemon, Armitage et al. 2012; Shanahan 2012). Such interventions are thought to be simpler than any human gene therapy and are of great economical potential for both the private and public health sector. Prokaryotes are some of the most diverse organisms on the planet bearing many known and unknown functions that affect nearly all aspects of life on Earth. For instance, the bacteria that populate the ocean affect key chemical balances in the atmosphere and ensure the very habitability of the Earth. Also it is a known fact that soil microorganisms are fundamental for terrestrial processes as they play an important role in various biogeochemical cycles by 2 contributing to plant nutrition and soil health (Mocali and Benedetti 2010). As such microorganisms are of great hope for scientific research and potential biotechnological applications. This increasing interest in understanding the role of the microbiome in planet ecology, health and disease as well as other biotechnological applications promises very important economic and societal benefits for those countries that are involved in such research. The holistic study of the human microbiome is a fairly new approach that necessitates very expensive cutting-edge technologies and multi-disciplinary teams. Only big research institutions with large funding programs, usually from developed countries, are currently able to undertake such projects, leaving developing countries behind in this field. This chapter introduces first the new science of metagenomics and its many challenges while reviewing some of the major discoveries up to date. We discuss next the benefits that developing countries might reap if they were to build the needed infrastructure and become involved in microbiome research. The new science of Metagenomics Now that we have established the importance of the microorganisms that live inside and around us, let us focus on the available methods and tools used to study them. An estimated 99% of the prokaryotes are difficult to study in isolation for several reasons (Streit and Schmitz 2004; Schloss and Handelsman 2005): (i) they depend on other organisms for critical processes, (ii) fail to grow in vitro or (iii) have even become extinct in fossil records (Tringe and Rubin 2005). These obstacles can be bypassed by focusing on DNA, a very stable molecule that can be isolated directly either from living or dead cells. Usually DNA extracted from a given sample belongs to different microbial genomes constituting what is termed a metagenome. The study of metagenomes is a new emerging field, which is referred to as metagenomics (NRCC 2007). The progressive reduction in the cost of high-throughput sequencing made possible to sequence large quantities of DNA from mixtures of organisms (Shendure, Mitra et al. 2004; Metzker 2010) offering a very detailed insight into entire ecosystems previously thought to be inaccessible. Quantitative metagenomics focuses on quantifying DNA molecules in a given sample as opposed to functional metagenomics which focuses on clone expression (Lakhdari, Cultrone et al. 2010). Quantitative metagenomics, on which we will mostly focus in this chapter, can be approached through different strategies (Gabor, Liebeton et al. 2007). The sequencing of the 16S ribosomal RNA gene is one of the most accessible and thus most frequently used approaches in quantitative metagenomics. Prior studies of bacterial evolution and phylogenetics provided the foundation for subsequent applications of sequencing based on 3 16S rRNA genes for microbial identification (Winker and Woese 1991). Indeed the 16S rRNA genes consist of highly conserved region sequences that alternate with regions of variable nucleotide sequence, which are used for taxonomic classification. The16S rRNA gene is, thus, a good marker to explore the phylogenetic composition of a given sample, identify new species or even unknown phylogenetic groups. In quantitative metagenomics variable regions of bacterial 16S rRNA genes are usually amplified by PCR and then subjected to library construction followed by sequencing using next-generation technologies. The pool of sequenced reads are then clustered, may be mapped onto a database of previously characterized sequences, and used for further analyses in the studied context. Microbial 16S rDNA sequencing is considered the gold standard for characterizing microbial communities, but this approach would fail to capture information about what the functions of different organisms are, knowing that organisms with identical 16S sequencing may perform very different functions. A good example is the difference between various strains of Escherichia coli (Enterohaemoragic – EHEC, Enterotoxic – ETEC, Enteroaggregative - EAEC) and related organisms such as Shigella sonnei, which have different clinical manifestations, different treatment modalities, yet are undistinguishable by 16S rRNA sequences (Harris and Hartley 2003). To overcome the limitations of the 16S rRNA profiling approach, the sequencing of entire microbial genomes (made possible by next generation sequencing technologies) constitutes a very attractive strategy for comprehensive metagenomics studies. The whole-genome (also called WMS for whole metagenomic sequencing) approach is increasingly used and has already produced many interesting results that we discuss in the next two sections. Figure 1 illustrates an overview of a WMS pipeline from the collection of samples to the generation of hypotheses and the testing of prediction models. Sample collection from a given environment is a crucial process since the microbial communities may be quite different between two very close locations (as is the case, for example, for soil environments) and should be determined according to the project needs. The DNA extraction protocols are also deciding factors and depend on the microbial composition of the sample. For example Gram-positive bacteria, which are hard to lyse organisms, might be underrepresented or overrepresented in environmental DNA preparations depending on the extraction protocol. Sequencing followed by mapping onto a selected reference gene catalog and bioinformatics pre-treatment analyses constitute another important part of this pipeline that will ensure that the biological signal is isolated while reducing the noise caused by the technical variability throughout the study. Finally, the use of the right statistical tools and datasets will be crucial in hypothesis generation and testing. We discuss later in this chapter the different issues and challenges faced at each stage of this process (cf. Figure 1). 4 !"#$%&'' 0&1&0&+(&'' ;.).+1)0#"*(!' !&,-&+(.+/' /&+&'$0)3%.+/' ()%%&(*)+' ()+!20-(*)+' !2"*!*(!'"+"%<!&!' !:)02'!&,-&+(&!' %&"&$'!*"#1$ +,-.,$/0))0!"$$ &#'&(&')*+$% 4"#$%&' ()%%&(*)+' library'$0&$"0"*)+ 0&%"*)+'?.2:'' 2)2"%'567' 6&82'9&+&0"*)+' !&,-&+(&'#"$$.+/' (%.+.("%'>"2"' 4&,-&+(.+/' !"#!$%&"&$'(#()!%*&$ 1 2 !"#"$% 1 2 3 4 3 4 @>&+*1<'(%.+.("%%<' 0&%&A"+2'/0)-$!' $0&$0)(&!!.+/' 0&1&0&+(&'/&+&' +)0#"%.="*)+' ("2"%)/-&' >.#&+!.)+'0&>-(*)+' ("2"%)/-&' !20-(2-0"*)+' ;-.%>'"+>'2&!2'' $0&>.(*)+'#)>&%!' &8.!*+/'/&+)#&!' Figure 1: Overview of a whole-metagenome-sequencing project from sample collection to hypotheses generation (after N. Pons & E. Le Chatelier). Another important and increasingly used application of WMS is the study of gene expression. The sequencing of cDNA, which corresponds to the whole RNA in a given sample, has brought many new application possibilities to scientists. With cDNA microarrays, a gene expression measuring technology, it is possible to focus only on those transcripts that have a corresponding probe on the chip and which are usually linked to coding sequences. RNA-Seq technology allows to bypass this limitation and gives a true holistic view of the transcriptome (Shendure 2008). Largely used in single genome transcriptomics it is now starting to be applied in metatranscriptomics settings. The RNA-Seq approach offers an unprecedented resolution on both the activity of a given bacteria and the functional dynamics of the genes. On the other hand, it comes with a price to pay, that of the analytical challenge that underlies the complexity behind the very large number of variables in the data. Other “meta-omics” approaches such as metaproteomics or meta-metabolomics are still in their infancy but just as promising. The precise bio-characterization of samples from different environments of interest is increasingly becoming a routine with the help of metagenomics and other meta-omics technologies and this new science is advancing very quickly. Many discoveries are being made in relation to human health and the environment as we discuss hereafter. Metagenomics in health and disease Many projects have been funded these last years aiming to characterize the human microbiome and uncover its impact in human health and disease. One of the first 5 internationally coordinated efforts was the European-funded MetaHIT1 project (Ehrlich and MetaHIT 2010), which started in early 2008. Its main objective was to establish associations between the genes of the human intestinal microbiota and health and disease, focusing on two main disorders of increasing importance in Europe: Inflammatory Bowel Disease (IBD) and obesity. One of the first important achievements was the establishment of an extensive reference catalog of microbial genes present in the human intestine. Indeed, more than 85% of gut bacteria are unknown, and more than 80% of them are considered today unculturable (Eckburg, Bik et al. 2005; Qin, Li et al. 2010). This study offered the first high-resolution picture of the immense diversity and complexity of the gut microbiota. The size of our intestinal metagenome is 150 times larger than that of our own genome and is constituted of more than three million non-redundant microbial genes, which are largely shared among the individuals of the studied cohort. Over 99% of them are bacterial genes, indicating that the entire cohort harbors more than 1,000 prevalent bacterial species and each individual at least 160 such species (Qin, Li et al. 2010). The HMP2 is another major project funded by the NIH. Its main goals are to characterize the microbial communities found at several different sites on the human body, including nasal passages, oral cavities, skin, the gastrointestinal and urogenital tracts, and to analyze the role of these microbes in human health and disease (Group, Peterson et al. 2009). They found that, in a cohort of healthy people, oral and stool communities were especially diverse in terms of community membership, while vaginal sites harbored particularly simple communities. Additionally, even though the diversity and abundance signature of each body site were found to vary among individuals, a niche specialization as well as the metagenomic carriage of metabolic pathways were observed to be stable among the subjects (Human Microbiome Project 2012). Overall, in healthy humans, microbiota tend to occupy a range of distinct configurations from many of the disease-related perturbations studied to date (Sokol, Pigneur et al. 2008; Qin, Li et al. 2010). As a consequence of the complexity of metagenomics data and the relatively new age of this field, significant effort was needed in building the analytical framework as well as the associated bioinformatics pipelines and tools. Both of the aforementioned projects among other more isolated initiatives, helped in developing such technologies many of the current projects are now using in order to discover associations between the gut microbiome and clinical phenotypes and diseases (Ehrlich and MetaHIT 2010; Qin, Li et al. 2010; Human Microbiome Project 2012; Morgan and Huttenhower 2012). 1 Acronym for “Metagenomics of the Human Intestinal Tract”. URL: http://www.metahit.eu/ 2 Acronym for “Human Microbiome Project”. URL: http://www.hmpdacc.org/ 6 One of the properties of a sampled ecosystem is species diversity, which when highly diverse is usually linked with good health. Indeed, it was discovered that low diversity in gut microbiota is associated with several human diseases such as obesity and inflammatory bowel disease (Turnbaugh, Hamady et al. 2009; Qin, Li et al. 2010). In some other cases high diversity may be associated with disease such as bacterial vaginosis for example (Srinivasan, Hoffman et al. 2012). A recent study showed the involvement of intestinal flora in type-2-diabetes on a Chinese cohort. Approximately 60,000 microbial genes were found to be differentially abundant among type-2-diabetic patients who were also characterized by a moderate degree of gut microbial dysbiosis, a decrease in the abundance of some universal butyrate-producing bacteria and an increase in various opportunistic pathogens. The authors also demonstrated that these gut microbial markers might be useful for classifying type-2-diabetic patients based only on their fecal samples (Qin, Li et al. 2012). Intestinal flora was also related to the inflammatory status of the host in symptomatic atherosclerosis patients, who were found to be enriched in the genus Collinsella as opposed to the controls enriched in Eubacterium and Roseburia (Karlsson, Fak et al. 2012). Even though this study cannot provide evidence for direct causal effects, these findings indicate that the gut metagenome may play a role in the development of systematic atherosclerosis knowing that inflammation is an important contributor to the pathogenesis of atherosclerosis (Hansson 2005). Accumulating evidence now indicates that the gut microbiota also communicates with the central nervous system, possibly through neural, endocrine and immune pathways, and thereby influences brain function and behavior (Grenham, Clarke et al. 2011; Cryan and Dinan 2012). Studies in germ-free animals and in animals exposed to pathogenic bacterial infections, probiotic bacteria or antibiotic drugs suggest a role for the gut microbiota in the regulation of anxiety, mood, cognition and pain. Factors, including infection, disease and antibiotics, may alter the stability of the natural composition of the gut microbiota and thereby have a deleterious effect on the well-being of the host (Forsythe, Sudo et al. 2010). Another study demonstrated the key role of the gut microbiota in immuno-modulatory mechanisms underlying multiple sclerosis. Mice genetically predisposed to spontaneously develop EAE (Experimental Autoimmune Encephalomyelitis, an animal model for multiple sclerosis disease) were housed under germ-free conditions and, as a result, remained fully protected from EAE throughout their life until this protection dissipated upon colonization with conventional microbiota in adulthood (Berer, Mues et al. 2011). Several small studies 7 have demonstrated links between altered intestinal microbiota in children with autism as compared with controls (Finegold, Dowd et al. 2010; Adams, Johansen et al. 2011; Finegold, Downes et al. 2012). These relations may be explained however by different factors such as diet and larger controlled clinical studies are needed for more evidence. These are but a few studies among many others that have taken the first steps in demonstrating the existence associations between intestinal flora and different human diseases. Scientists haven’t had enough time yet to gather evidence on causality but this is the next step. Meanwhile there have already been some results indicating how we can use metagenomics and the microbiome to improve our health (Shanahan 2012). A first application area is personalized medicine. Information on the human microbial ecosystems may help stratify individuals and reduce the variability of the cohort so that a given treatment is more effective if adapted to any of the different sub-phenotypes. For instance, an unexpected discovery was the identification of enterotypes, three robust clusters, which remain consistent among different countries and cohorts and are stable over time (Arumugam, Raes et al. 2011). Even though the discrete nature of enterotypes is debated (Jeffery, Claesson et al. 2012), they are found to be associated with long-term diet, particularly protein and animal fat (Bacteroides) versus carbohydrates (Prevotella) (Wu, Chen et al. 2011). The potential of the human microbiome as an early detection biomarker for diagnostic and prognostic purposes is a very active area of research. Oral or fecal microbial samples can be obtained very easily and used immediately as diagnostic tools. As an example microbial genes associated with type-2-diabetes (Qin, Li et al. 2012) were used to construct prediction models that could correctly classify a sample with an accuracy of greater than 80%. This is considerably better than models based on the human genes (66%) linked with type-2-diabetes by genome wide association studies (van Hoek, Dehghan et al. 2008). Most clinical indicators are not optimal such as the OGCT3 for diabetes or BMI for obesity (Romero-Corral, Somers et al. 2008). Newer, more biologically relevant indicators are needed. The microbial biomarkers can help in this quest and could even be used to predict future occurrences of a disease. The human microbiome and especially the gut flora offer a yet to-be-appreciated potential for interventional medicine and open unprecedented possibilities for curing human diseases. One area of intervention consists in modulating the disrupted microbial ecosystem and bringing it close to a normal state. This can be achieved through different ways such as by using prebiotics and probiotics (Sharp, Achkar et al. 2009; Gareau, Sherman et al. 2010). The use of 3 Acronym for “Oral Glucose Challenge Test” a score on which diabetes classification and diagnostics is based. 8 probiotics has been already shown to be successful in animal studies where Lactobacilli and Bifidobacteria based probiotics can alleviate visceral pain induced by stress (Verdu, Bercik et al. 2006); the role of probiotics in treating diseases such as IBS has also been shown (Aragon, Graham et al. 2010). Another way in achieving ecosystem modulation is through fecal transplant, also known as fecal flora reconstitution (Baquero and Nombela 2012; Borody and Khoruts 2012). Different studies have already shown the success of this approach in treating extreme cases of Clostridium difficile infections that were resistant to antibiotics (van Nood, Vrieze et al. 2013). Despite the very high success rate (>90% - far better than many drugs), there are still ethical issues that need to be addressed and more clinical studies should be performed for this intervention approach to be more widely accepted and used in the medical field. Finally, therapeutic drugs can be designed to directly interact with the microbiome and modulate its different functions to change its state and thus transform the diseased phenotype into a healthy state (Jia, Li et al. 2008; Haiser and Turnbaugh 2012). Environmental metagenomics Microorganisms represent the largest reservoir of genetic diversity on Earth, outnumbering all other organisms (NRCC 2007). As an example, bacteria are responsible for about half of the photosynthesis on Earth. In spite of their crucial role, prokaryotic diversity still suffers from one of the greatest knowledge gaps in the biological sciences and remains largely unexplored and unexploited (Rodriguez-Valera 2004). There is no universally agreed estimate about their real total number, their real diversity or what principles govern their origin and change. Some researchers estimate the total number of prokaryotic cells on earth at 5x1030 including 106-108 individual genomes belonging to different species (Sleator, Shortall et al. 2008). For the soil, there are estimates ranging between 3,000 and 11,000 microbial genomes per gram of soil (Schmeisser, Steele et al. 2007), which makes it clear that current technologies could not support complete sequencing of such highly diverse environments (Kowalchuk, Speksnijder et al. 2007). Beyond the inter species diversity, there is also an intra species diversity that has been overlooked but that has important consequences. For example, in an easy-to-cultivate species such as Escherichia coli may lay a vast gene pool that is not accessible by studying one single strain. Indeed, the diversity of the genes within a bacterial species is another important facet of prokaryotic diversity (Boucher, Nesbo et al. 2001). Metagenomics as a culture-independent genomic analysis can also help discover more about the microbial diversity of natural environments such as soil, water and sediments (Lopez-Garcia and Moreira 2008) and has applications in agriculture, sustainability, engineering and environment. 9
Description: