OUP user menu

Defining the human microbiome

Luke K Ursell , Jessica L Metcalf , Laura Wegener Parfrey , Rob Knight
DOI: http://dx.doi.org/10.1111/j.1753-4887.2012.00493.x S38-S44 First published online: 1 August 2012


Rapidly developing sequencing methods and analytical techniques are enhancing our ability to understand the human microbiome, and, indeed, how the microbiome and its constituents are defined. This review highlights recent research that expands our ability to understand the human microbiome on different spatial and temporal scales, including daily time series datasets spanning months. Furthermore, emerging concepts related to defining operational taxonomic units, diversity indices, core versus transient microbiomes, and the possibility of enterotypes are discussed. Additional advances in sequencing technology and in our understanding of the microbiome will provide exciting prospects for exploiting the microbiota for personalized medicine.

  • human microbiome
  • microbial diversity
  • 16S rRNA sequencing


The human microbiota consists of the 10–100 trillion symbiotic microbial cells harbored by each person, primarily bacteria in the gut; the human microbiome, however, consists of the genes these cells harbor.1 Microbiome projects worldwide have been launched with the goal of understanding the roles that these symbionts play and their impacts on human health.2,3 Just as the question “What is it to be human?” has troubled humans from the beginning of recorded history, the question “What is the human microbiome?” has troubled researchers since the term was coined by Joshua Lederberg in 2001.4 Specifying the definition of the human microbiome has been complicated by confusion about terminology; for example, “microbiota” (the microbial taxa associated with humans) and “microbiome” (the catalog of these microbes and their genes) are often used interchangeably. In addition, the term “metagenomics” originally referred to shotgun characterization of total DNA, although now it is increasingly being applied to studies of marker genes such as the 16S rRNA gene. More fundamentally, however, new findings are leading us to question the concepts that are central to establishing the definition of the human microbiome, such as the stability of an individual's microbiome, the definition of the operational taxonomic units (OTUs) that make up the microbiota, and whether a person has one microbiome or many. This review covers the progress made towards defining the human microbiome in these different respects.

Studies of the diversity of the human microbiome started with Antonie van Leewenhoek, who, as early as the 1680s, had compared his oral and fecal microbiota. He noted the striking differences in microbes between these two habitats and also between samples from individuals in states of health and disease in both of these sites.5,6 Thus, studies of the profound differences in microbes at different body sites, and between health and disease, are as old as microbiology itself. What is new today is not the ability to observe these obvious differences, but rather the ability to use powerful molecular techniques to gain insight into why these differences exist and to understand how we can affect transformations from one state to another.

Culture-independent methods for characterizing the microbiota, together with a molecular phylogenetic approach to organizing life's diversity, provided a fundamental breakthrough in allowing researchers to compare microbial communities across environments within a unified phylogenetic context (reviewed by Pace7). Although host-associated microbes are presumably acquired from the environment, the composition of the mammalian microbiota, especially in the gut, is surprisingly different from free-living microbial communities.8 In fact, an analysis of bacterial diversity from free-living communities in terrestrial, marine, and freshwater environments as well as from communities associated with animals suggests that the vertebrate gut is an extreme environment.8 In contrast, bacterial communities from environments typically considered extreme, such as acidic hot springs and hydrothermal vents, are similar to communities in many other environments.9 This suggests that coevolution of vertebrates and their microbial consortia over hundreds of millions of years has selected for a specialized community of microbes that thrive in the gut's warm, eutrophic, and stable environment.8 In the human gut and across human-associated habitats, bacteria comprise the bulk of the biomass and diversity, though archaea, eukaryotes, and viruses are also present in smaller numbers and should not be neglected.10,11

Interestingly, estimates of the human gene catalog and the diversity of the human genome pale in comparison with estimates of the diversity of the microbiome. For example, the MetaHIT Consortium reported a gene catalog of 3.3 million nonredundant genes in the human gut microbiome alone,3 as compared with the approximately 22,000 genes present in the entire human genome.12 Similarly, the diversity among the microbiome of individuals is immense compared with genomic variation: individuals are about 99.9% identical to one another in terms of their human genome,13 but can be 80–90% different from one another in terms of the microbiome of their hand14 or gut.15 These findings suggest that employing the variation contained within the microbiome will be much more fruitful in personalized medicine, i.e., the use of an individual patient's genetic data to inform healthcare decisions, than approaches that target the relatively constant host genome.

Many fundamental questions about the human microbiome were difficult or impossible to address until recently. Some questions, such as the perennially popular “How many species live in a given body site?” are still hard to answer because of problems with definitions of bacterial species and with the rate of sequencing error. Other questions, such as “How does the diversity within a person over time compare with the diversity between people?” or “How does the diversity between sites on the same person's body compare with the diversity between different people at the same site?” or “Is there a core set of microbial species that we all share?” can now be answered conclusively. In the next section, some of the tools that have allowed these long-standing questions to be answered are discussed.


The drastic reduction in sequencing costs experienced over the past few years has made it possible to identify specific microbial taxa found within the human gut that are difficult or impossible to culture. Researchers are now able to generate millions of sequences per sample in order to assess differences in microbial communities among body sites and individuals. This increased sequencing power has required the development of equally powerful computational tools to handle the burgeoning amount of sequence data produced by modern technologies.16 There are several pipelines for analyzing microbial community data, such as mothur,17 WATERS (Workflow for the Alignment, Taxonomy, and Ecology of the Microbial Environment),18 the RDP (Ribosomal Database Project) pyroseqeuncing tools,19 and QIIME (Quantitative Insights Into Microbial Ecology, pronounced “chime”).20 QIIME is a platform for the analysis of high-throughput sequencing data that enables users to import raw sequence data and readily produce measures of inter- and intra-sample diversity. Consistency in the identification of OTUs and the establishment of agreed-upon measures of diversity within and between samples are crucial for the comparison of results across studies, although the concept of OTU is becoming increasingly problematic as sequence data accumulate and as explicitly phylogenetic approaches gain in popularity.

Beta diversity refers to the measurement of the degree of difference in community membership or structure between two samples. A recent review of taxon-based measurements of beta diversity found that some metrics, including Canberra and Gower distances, have increased power for discriminating clusters, while other metrics, such as chi-squared and Pearson correlation distances, are more appropriate for elucidating the effects of environmental gradients on communities.21 A robust method for comparing the differences between microbial communities is UniFrac, which measures the proportion of shared branch lengths on a phylogenetic tree between samples.22 Highly similar microbial communities result in UniFrac scores near 0, while two completely independent communities that do not share any branch length (i.e., they have a different evolutionary history) would result in a UniFrac score of 1. Principal coordinates analysis (PCoA) can then visualize the UniFrac distances between samples in two-dimensional or three-dimensional space, allowing the clustering of similar communities or the separation of distinct communities to be easily distinguished visually.

UniFrac as a measure of beta diversity, coupled to PCoA, has the ability to distinguish differences between communities, utilizing as little as 10 sequences per sample.23 It is important to recognize that increased sequencing depth is not always necessary to recover biologically meaningful results when those results are obvious. Thus, by choosing diversity measurements that are appropriate for a study design, researchers utilizing modern sequencing methods are able to characterize differences among samples at relatively low sequence coverage. This enables researchers to assess fine-grained spatial and temporal patterns by characterizing hundreds to thousands of samples, such as time series across multiple patients or environments. A multitude of diversity measurements, including UniFrac, are available in QIIME and can be readily compared.

In general, pipelines for analyzing 16S rRNA and shotgun metagenomic data have separate workflows. Some initial steps, such as demultiplexing (removing barcodes from and separating pooled samples) and quality filtering, are common to both pipelines. However, for 16S rRNA data, sequences must be grouped into OTUs, chimeric sequences generated by incomplete template extension must be removed, and phylogenetic trees must be constructed. In contrast, in the metagenomic pipeline, sequences must be assigned to functions as well as to taxonomy (either as whole reads or after assembly). Once taxon or gene function tables are constructed, the pipelines begin to converge, at least conceptually. The interest is then 1) in the composition of each sample, 2) in finding the taxa or functions that discriminate between groups of samples (e.g., according to clinical parameters), and 3) in asking whether the samples cluster according to any measured clinical states (or according to time). One exciting emerging direction is comparing metagenomic and 16S rRNA clustering directly using a technique called Procrustes analysis that allows the PCoA plots to be combined.24 Another powerful tool is the use of machine learning and statistical techniques to build predictive models of taxa25 or functions26 that discriminate between groups of samples.

A unique advantage of QIIME relative to other pipelines is its ability to exploit “sample metadata,” e.g., clinical information about subjects, to produce visualizations that make the main patterns in the data immediately apparent. Of particular interest, QIIME supports the MIMARKS (Minimum Information about a MARKer Sequence) standard27 developed by the Genomic Standards Consortium,28 as do other tools for microbial and community analysis such as MG-RAST (metagenomics Rapid Annotation using Subsystem Technology).29 MIMARKS is becoming increasingly popular and has been adopted by the INSDC (International Nucleotide Sequence Database Consortium, which includes GenBank, EBI, and DNA Databank of Japan) as the standard for metadata.

With these tools in hand, the identification of basic patterns of similarities and differences in the microbiota is now routine. The key challenge moving forward is to extend analyses to include longitudinal studies and to understand the role of specific host and environmental factors in the development and maintenance of the microbiome.


The gastrointestinal tract of a human infant provides a brand new environment for microbial colonization.30 Indeed, the microbiota that an infant begins to acquire depends strongly on the mode of delivery.31 Twenty minutes after birth, the microbiota of vaginally delivered infants resembles the microbiota of their mother's vagina, while infants delivered via Cesarean section harbor microbial communities typically found on human skin.32 The acquisition of microbiota continues over the first few years of life, with an infant's gastrointestinal tract microbiome beginning to resemble that of an adult as early as 1 year of life.33 In one case study in which an infant's microbiota was monitored over the first 2.5 years of life, phylogenetic diversity increased significantly and linearly with time.34 Additionally, significant changes in the composition of the gut microbiota were apparent at five time points: introduction of a diet of breast milk, development of fever at day 92, introduction of rice cereal at day 134, introduction of formula and table foods at day 161, and introduction of antibiotic treatment and adult diet at day 371.34 Interestingly, each dietary change was accompanied by changes in the gut microbiota and the enrichment of corresponding genes. For example, when the adult diet was introduced to the infant, genes in the microbiome associated with vitamin biosynthesis and polysaccharide digestion became enriched.34

The interaction between the human microbiota and the environment is dynamic, with human microbes flowing freely onto touched surfaces every day. Fierer et al.35 showed that human fingertips can transfer signature communities of microbes onto keyboards, and that these communities strongly differentiate individuals.35 PCoA plots showed that it was possible to determine which fingers were typing on which keys, and which individuals were using which keyboards: it was even possible to link a person's hand to the computer mouse they use with up to 95% accuracy when compared with a database of other hands. Overall, this study showed that microbial communities are constantly being transferred between surfaces, and that a dynamic interaction exists between environmental microbiota and different human body sites.


Another interesting question still being explored is how stable the microbiome within an individual is over time. By defining what constitutes normal temporal variation in an individual over time, the changes in microbial communities that result from dietary and pharmaceutical interventions can be better quantified and understood. In the longest time series study to date, Caporaso et al.36 sampled two individuals' microbial communities in the gut, oral cavity, and left and right palms over 396 time points spanning 15 months. Communities at different body sites were readily distinguishable from one another using 3-D PCoA plots over a 1-year time span, even though the community structure within a given site was highly variable.36 The level of diversity among body sites is also different, with the mouth and gut harboring the most diverse communities.37 Taken together, these studies show that an individual's microbiota represents a highly variable and compartmentalized ecosystem.

Overall, it has yet to be conclusively proven that individuals, or even body sites, harbor a “core” set of specific bacterial taxa. For example, the MetaHIT Consortium defined a “core” set of lineages as those that were present in half of the subjects studied, although essentially no genes were present in all subjects studied.3 Of course, it is important to recognize that sampling depth may be critical for distinguishing taxa that are absent from those that are merely very rare; the dynamic range of microbial abundance is also quite large, and even within the MetaHIT “core” genes, 2000-fold ranges of abundance were not uncommon. Proving that a taxon is completely absent in the gut is not possible with these types of studies, so core calculations should always carry with them a caveat about sequencing depth. Another factor to consider when defining diversity and a core is that methodological artifacts can greatly increase the apparent numbers of OTUs in a sample (and hence reduce the apparent fraction that is shared). Both sequencing error38,39 and issues related to alignment, especially multiple sequence alignment,4043 can inflate the number of OTUs immensely. It is important to ensure that the same methodological procedures are used when performing estimates of the core in terms of the fraction of individuals in whom the core must be represented, the minimum abundance, and the procedure for deciding which sequences count as “the same.” Finally, there is a key question about whether variation around a core is structured so that humans harbor only a few general types of microbiota profiles in a given body site; this is well established for the vagina44 but more controversial in the gut.45 In general, extreme caution must be applied when performing clustering procedures, as many researchers will break up a continuous variation into clusters where none exist.21 Robust model selection procedures that incorporate the possibility that only continuous variation, not discrete clusters, exist remain to be developed within the context of microbial community analysis.

There is increasing evidence that individuals actually share a “core microbiome” rather than a “core microbiota.” In a study of monozygotic and dizygotic twin pairs concordant for obesity or leanness, a subset of identifiable microbial genes, but not species, were shared among all individuals.15 Remarkably, vastly different sets of microbial species yielded very similar functional KEGG pathways. However, deviations from this core microbiome were apparent in obese subjects, suggesting it will be important to utilize metagenomic data in addition to determining microbial community composition with 16S marker gene studies when assessing differences among disease states. Understanding whether this principle holds true for other body sites will be fascinating; cross-biome metagenomic comparisons have been exceedingly rare to date.46,47


Evidence is mounting for the inextricable linking of a host's microbiota, digestion, and metabolism. In an analysis of humans and 59 additional mammalian species, 16S rRNA sequences clustered together carnivores, omnivores, and herbivores in principal coordinate spacing, showing that community structures differ depending on diets.48 Dietary changes in mice can also lead to significant changes in bacterial metabolism, especially small-chain fatty acids and amino acids, in as little as 1 week,49 and can lead to large changes after only 1 day.50 Importantly, the genetic diversity found within the human gut microbiota allows the digestion of compounds via metabolic pathways not explicitly coded for in the mammalian genome, greatly increasing the ability of humans to extract energy from diverse diets.51,52

The gut microbiota also seems to play an important role in obesity. Germ-free mice that receive a transplant of gut microbiota from conventional mice show an increase in adiposity without increasing food intake; this is attributed to increased energy extraction from the diet and increased energy deposition into host adipocytes.53 The two major microbial divisions, Firmicutes and Bacteroidetes, show different abundances depending on phenotype. Genetically obese mice (ob/ob) have decreased Bacteroidetes and increased Firmicutes when compared with their lean counterparts,54 and the obesity phenotype can even be transferred to a germ-free but genetically wild-type mouse by way of the microbiota. The phenotype is due to energy balance: bomb calorimetry of the fecal pellets reveals that ob/ob mice extract more energy from their diet and leave less behind in the feces.51 Fascinatingly, the same effects hold true for another mouse model, TLR5 knockout mice, which also become obese in some mouse facilities (but develop colitis in others, presumably due to differences in the background microbiota). The TLR5 knockout mice also produce a transmissible obesity phenotype, although no difference in the efficiency of energy harvest is involved. Instead, the altered microbiota somehow makes the mice hungrier, and their microbe-induced obesity can be cured by treatment with antibiotics or by restricting the amount of food in their cages to that consumed by wild-type mice.55 The correlation between microbes and obesity is perhaps best illustrated through weight loss. As different groups of human subjects were placed on either a fat-restricted or a carbohydrate-restricted diet, the abundance of Bacteroidetes increased as their body weight decreased, transitioning from the signature “obese” microbial community to a “lean” community.56 Thus, the modulation of a patient's microbiota might be a therapeutic option for promoting weight loss in obese patients or for promoting weight gain in underweight children.

Surprisingly, the microbes ingested with food might be providing the individual human microbiome with new genes to digest new foods. Hehemann et al. 57 found that a new class of glycoside hydrolases used to digest porphyran, a polysaccharide common in red algae, was also found in human stool samples as a gene in Bacteroides plebeius. A closer examination of the stool metadata revealed that the stool samples containing the porphyran-digesting gene were only present in Japanese individuals; the gene was not found in the gut microbiome of individuals from the United States. Why would a marine gene be found in human gut? The authors concluded that the seaweed common in the Japanese diet, but not present in the American diet, contained the microorganism that transferred the genes to the gut microbiome. Thus, microbes have the ability to greatly increase the number of metabolic tools of the human gut, allowing us to digest an array of substrates.


One key question is whether the human gut microbiota is sufficiently plastic to allow well-defined interventions to improve health. As described above, the gut microbiota, once established, is fairly stable over time, at least compared with the differences among individuals. However, a number of studies demonstrate that external forces can alter the community of microbes located in the gastrointestinal tract, with antibiotics being an important example.

Antibiotics are mainly used to combat pathogenic bacterial species that reside within or have invaded a host. The current generation of antibiotics, however, are broad spectrum and target broad swaths of the normal microbiota as well. Thus, antibiotics significantly affect the host's innate gut microbiota. Three to four days after treatment with the broad-spectrum antibiotic ciprofloxacin, the gut microbiota experiences a decrease in taxonomic richness, diversity, and evenness.58,59 The large magnitude of changes in the gut microbiota demonstrated significant interpersonal variability. While the gut microbiota began to resemble its pretreatment state a week after treatment, differences among individuals were observed with regard to how closely the post-treatment community resembled the pretreatment community, with some taxa failing to return to the community.59,60 Indeed, the re-establishment of some species can be affected for up to 4 years following antibiotic treatment.61 Yet, the overall recovery of the gut microbiota following antibiotic treatment suggests there are factors within the community, whether biotic or abiotic, that promote community resilience, although these have yet to be elucidated.

Other antibiotics also tend to produce results that differ substantially among subjects61,62 and even body sites.63 Because larger populations have not yet been studied, in part due to ethical issues related to the administration of antibiotics to healthy human subjects, the basis for these underlying differences has not yet been elucidated. Understanding the factors that determine the ability of a microbiota to resist and recover from perturbation, as well as understanding the factors that determine its current state, will be key to developing tools to assist in microbiome manipulation. For example, counter intuitively, the administration of antibiotics prior to cecal transplant in rats actually reduces the chance that new microbes will establish.64

One fascinating hint that the microbiota may be more plastic than imagined is the recent treatment of persistent Clostridium difficile infections via stool transplant, which has been successful in a number of studies6571; in general, the depauperate gut community produced during the C. difficile infection is replaced by the donor community.66,72 The success of this technique is remarkable, especially considering how little is known about the best community to supply. For example, is it better to receive the fecal community of a close relative or of a cohabiting individual, or perhaps to bank one's own stool before beginning antibiotic treatment so that it can be restored later? Is the same stool good for everyone, or do the vast differences in the microbiota imply that each person's microbes are specifically adapted relative to those they might receive from a donor? As with blood types, are there “universal donors” and “universal recipients”? These and many other questions remain to be answered.


As in every year since the initial sequencing of DNA, this year has resulted in unprecedented growth in the amount of sequence data gained, collected at an unprecedentedly low cost. Increasingly powerful tools used to extract meaningful patterns from this wealth of data have been developed or updated as well. Emerging technologies such as stool transplantation, 16S rRNA and whole-genome sequencing on the Illumina platform, the ability to transplant human microbial communities into mice with high efficiency even from frozen samples,50 and the creation of personalized culture collections73 raises the prospect of a future in which therapies for individual humans are piloted in a battery of mice that are subjected to different treatments, and where leave-one-out experiments that reveal the effects of the deletion of individual species,73 or of individual genes from within a species,74 allow insight into mechanism. Although the tools currently available are still imperfect (for example, the limited read length of today's high-throughput sequencing technologies limits the ability to detect bacterial species and strains, and analysis of viruses and eukaryotes is still very much an emerging frontier), the prospects for developing a mechanistic understanding of the factors that underlie the plasticity of the microbiome and then for manipulating the microbiome to improve health seem increasingly bright.


Funding.  Work cited in this review from the author's lab was supported in part by grants from the National Institutes of Health (HG4872 [RK]), the Crohn's and Colitis Foundation of America, and the Howard Hughes Medical Institute (RK).

Declaration of interest.  The authors have no relevant interests to declare.


View Abstract