2.3 Methods
Firstly, DNA was extracted from 35 C. hainanense leaf samples,
and the quality and concentration of the extracted DNA were tested
before being sent to Hangzhou Lianchuan Biotechnology Co. After
sequencing was completed, the SNPs of the C. hainanense genome
were mined. Based on these SNPs, the phylogenetic tree analysis ofC. hainanense was obtained using the neighbor-joining algorithm
of MEGA software. Principal component analysis (PCA) was then performed
on C. hainanense populations based on the SNPs. Additionally, the
population structure of all samples was analyzed using admixture
software to obtain the distribution of genetic material in different
populations of C. hainanense . Finally, genetic distances among
all samples were calculated based on the SNPs. The detailed method is
described in the paper by Chen et al. (2022).
2.3.1 Enzyme digestion protocol
design
Our simplified genome digestion scheme selected according to other
research methods is as follows, the restriction enzyme combination of
HaeIII + Hpy166II was selected. The ’Insert Size ’ was selected as
’550-600bp’ (Xia et al., 2019).
2.3.2 Sequencing Quality
Control
The Raw data (the number of reads in the original downstream data)
generated by sequencing is pre-processed by quality filtering to obtain
CleanData. The specific processing steps are as follows: 1) remove the
adapter, 2) remove the reads containing N (N means the information of
bases cannot be determined) with a proportion of more than 5%, 3)
remove the low-quality reads (the number of bases with quality value
Q<=10 accounts for more than 20% of the whole reads), 4)
count the raw sequencing volume, effective sequencing volume, Q20 (the
proportion of bases with quality values greater than or equal to 20,
sequencing error rate less than 0.01), Q30 (the proportion of bases with
quality values greater than or equal to 30, sequencing error rate less
than 0.001), GC means guanine (G) and cytosine (C)content, and perform a
comprehensive evaluation.
2.3.3 Comparison of consistency
sequences
We used Burrows-Wheeler aligner (BWA) software to match the sequencing
data to the consistent sequences obtained from reads clustering. Since
the reference used is the consistency sequence obtained from reads
clustering, the matching rate will vary somewhat between samples.
2.3.4 Variation detection and SNP
statistics
After comparing the data with the concordant sequences, we used Genome
Analysis Toolkit (GATK) and SAMtools software for variant detection,
retaining the SNPs that were consistently output by both software as
reliable loci. We further processed the SNP data by filtering them based
on MAF > 0.05 and data integrity > 0.8 and
retained the SNPs with polymorphisms among them. The final filtered SNPs
were input to the subsequent evolutionary analysis.Based on the obtained
SNP data, we analyzed the genetic evolution and structure of the
population using the differences in genetic information among the
samples of C. hainanense , including the phylogenetic
relationships among the samples, population structure, principal
component analysis (PCA), and relatedness among the samples. The
following part of the analysis involves grouping samples, and the 35
samples were divided into six groups according to species for analysis.