2.7. Identification and functional annotation of SNPs outliers
To detect footprint of genomic diversification and potential selection, outlier loci were identified using two independent approaches: PCAdapt software v 4.3.5 (Luu et al., 2017) and BayeScan v 2.1 (Foll and Gaggiotti, 2008). The PCAdapt package implemented in R software detects outlier loci based on PCA by assuming that markers excessively related to population structure are candidates for potential adaptation. The number of PCs to retain was chosen after checking score plots for population structuring, setting maf=0.01 and distance to ‘mahalanobis’. The distribution of loadings (SNP contribution to each PC) was uniform, indicating no relevant LD effect. P-values were corrected for false discovery rate (FDR) using a cut-off q< 0.01 for outlier retention. BayeScan is designed to detect potential genetic loci under selection by analyzing variations in allele frequencies among specific groups with a Multinomial-Dirichlet model. The prior odd (PO) for neutrality indicates the ratio of selected:neutral sites (e.g., 1:1000) and provides a measure of uncertainty on the likelihood of the neutral model compared to the selection model (Lotterhos and Whitlock, 2014). The sensitivity of the analysis to the PO was evaluated using alternative values (1:100, 1:10000). The final MCMC chain was run for 20 short pilot runs with 5000 integrations, 50000 burn-in, thinning interval of 10, and PO set to 100. Loci were filtered for q-value < 0.01.
PCAdat and Bayescan results were finally intersected, retaining outliers with Fst > 0.8, thus providing a conservative set of candidate loci. The above analysis was performed separately for each dataset (GBS, RADseq and COMBINED data).
The candidate SNP outliers were annotated by cross-referencing the SNP position against the GFF file of the rPodLil1.1 genome assembly (Gomez-Garrido et al., 2023) for Gene ID association. For outliers falling in protein-coding region, a Gene Ontology (GO) annotation was performed, followed by a functional enrichment analysis with gGOSt in g:Profiler [https://biit.cs.ut.ee/gprofiler/gost] on individual datasets (GBS and RADSeq). The Podarcis muralis genome was used as a reference to determine the functional categories (Biological processes (BP), molecular functions (MF) and cellular components (CC)) that were significantly enriched (FDR < 0.05). For the subset of outliers falling within coding regions (CDS), we annotated the codon position and assessed whether the alternative allele (ALT) translated into a synonymous or nonsynonymous substitution with respect to the reference position in the genome (REF).