The default removal of low-abundance (rare) taxa from microbial community analyses may lead to an incomplete picture of the taxonomic and functional microbial potential within the human habitat. Publicly available shotgun metagenomics data of healthy children and children with cystic fibrosis (CF) were reanalysed to study the development of the rare species biosphere, which was here defined by either the 15th, 25th or 35th species abundance percentile. We found that healthy children contained an age-independent network of abundant (core) and rare species with both entities being essential in maintaining the network structure. The protein sequence usage for more than 100 bacterial metabolic pathways differed between the core and rare species biosphere. In CF children, the background structure was underdeveloped and random forest bootstrapping based on all constituents of the early airway metagenome and host-associated factors indicated that rare taxa were the most important variables in deciding whether a child was healthy or suffered from the life-limiting CF disease. Attempts failed to make the age-independent CF network as robust as the healthy structure when an increasing number of bacterial taxa from the healthy network was incorporated into the CF structure by computer-based model simulations. However, the transfer of a key combination of taxa from the healthy to the CF network structure with high species diversity and low species dominance, correlated with a more robust CF network and a topological approximation of CF and healthy graph structures. Rothia mucilaginosa, Streptococci and rare species were essential in improving the underdeveloped CF network.
Keywords
- Graph kernels
- Human airway metagenome
- Microbiome development
- Random forest
- Rare species