3.3 Sensitivity of the taxonomic assignments is dependent upon the availability of representative genomes
We assessed the sensitivity of the taxonomic assignments using a recently updated checklist of the fishes sighted in the last century in the Chilika Lagoon (Suresh et al., 2018). Upon comparison, about 44.26% of the 61 bony fish families detected in our samples matched with the checklist (Supplementary Table 2). We suspected the low concordance between our results and the checklist could be due to the low representation of taxa from our ecosystem in the reference database. Hence, we inspected the availability of proteomes from the annotated reference genomes in the database and their influence on the taxonomic assignment. We found that about 71.74% of the 92 fish families in the checklist did not have a representative proteome in the reference database. On the contrary, about 86.88% of all the 61 fish families detected in this study were well represented in the database with complete proteomes (Supplementary Table 2). Hence, we re-calculated the sensitivity by only considering the 26 fish families from the checklist of our ecosystem that were represented in the database. By accounting for the incompleteness of the reference database, the sensitivity of the taxonomic assignment drastically increased to 88.46%, almost double the previous estimate. This signifies the importance of representative genomes in reference databases for sensitive detection of taxa from PCR-free sequencing data.
3.4 Extrapolation of taxon accumulation curves derived from extracellular eDNA reliably estimates the total taxonomic richness of an ecosystem
Since the observed richness of taxa in the samples is often limited by the sequencing depth, we inferred the total taxonomic richness of the ecosystem accounting for the low abundant taxa that could be potentially detected by increasing the sequencing depth. Through statistical extrapolation of the richness accumulation curve derived from incidence frequencies of taxa, the asymptotic family richness of the Chilika lagoon across the tree of life was estimated to be 1071.49 (SEM 20.82) (Fig. 4A). Comparing the observed family richness to the estimate of asymptotic family richness, we had detected almost 93.42% (SEM 0.01%) of the taxa across the tree of life in our dataset. Most of the undetected diversity was contributed by Eukaryotes (96.79%) as the family richness accumulation curves of Archaea, Bacteria, and Viruses were nearly saturated (Supplementary Fig. 7). Further, the asymptotic taxonomic richness of all the domains and different Eukaryotic kingdoms, phyla, and classes was estimated (Fig. 4B). The tree of life consisted of 799 families (SEM 20.02) of Eukaryota, 230 families (SEM 0.0) of Bacteria, 27 families (SEM 0.14) of Archaea, and 13 families (SEM 0.69) of DNA Viruses. Metazoa was the richest kingdom in Eukaryota with about 452 families (SEM 21.67), followed by the kingdoms Fungi and Viridiplantae with about 148 families (SEM 0.49), and 111 (SEM 14.49) families, respectively. Metazoa was composed of 265 families (SEM 26.95) of Invertebrates and 196 families (SEM 7.37) of Vertebrates. Actinopteri was the richest class in Phylum Chordata with 65 families (SEM 5.21) followed by Aves, Mammalia, Reptiles, and Amphibia (Fig. 4B). We noted that the estimated asymptotic richness of 65 bony fish families (SEM 5.21) is well within the theoretical maximum of 92 fish families sighted in Chilika in the last century (Suresh et al., 2018). This shows that the total taxonomic richness of an ecosystem can be reliably estimated by extrapolation of taxon accumulation curves derived from extracellular eDNA sequences.