We present a flowchart of our data analysis to investigate these three topics in Fig. S1. Our hypothesis is that areas with high human density, urbanization, and atherogenic infrastructure (e.g., main roads and built-up areas) will impede the movement of P. bengalensis and coincide with population boundaries. On the other hand, natural habitats and undeveloped land covers like forests and grasslands will act as corridors, facilitating gene flow even in mountainous regions with complex topography.
Materials and methods
Sample collection
We sampled the known current range of leopard cats in Taiwan with two distinct sampling strategies for collecting tissues and blood. The first strategy involved fresh tissues from live leopard cats captured in Miaoli and Taichung counties from 2017-2020. The Institutional Animal Care and Use Committee (IACUC) of the National Pingtung University of Science and Technology granted permission for the capture project (Permit numbers: 106003, 106014, 107041). The second sampling scheme involved tissues collected from road-killed and rescued leopard cats from 2002-2021. We obtained 184 samples for our analyses. The methods for sample preservation and DNA purification are described in detail in the Supplementary Materials & Methods (SMM).
Genotyping and genetic diversity analyses
We genotyped leopard cats using a combination of mitochondrial cytb sequences and nuclear DNA microsatellites. The final analyses utilized 12 previously published microsatellites (Menotti-Raymond et al., 1999). Primer information and the methods for genotyping are described in detail in the SMM. Diversity indices for microsatellites, including polymorphism information content (PIC), probability of identity (PID), expected heterozygosity (He), observed heterozygosity (Ho), and FIS were calculated using Cervus v3.0.7 (Kalinowski, Taper, & Marshall, 2007). Allelic richness was calculated in FSTAT v2.9.4 (Goudet, 1995). We checked for the presence and frequencies of null alleles using Brookfield's method implemented in Micro-Checker v2.2.3 (Van Oosterhout et al., 2004). All microsatellites and cytb data genotyped in this study were deposited on Figshare (DOI: 10.6084/m9.figshare.24188403).
For cytb, a unique haplotype FCN3 from Tamada et al. (2008) was added to our dataset for subsequent analyses. Tajimas' D and Fu and Li's F and D were calculated to detect selection or demographic changes. All neutrality tests and diversity indices of haplotype diversity (Hd) and nucleotide diversity (π) were computed using DnaSP v6 (Rozas et al., 2017). The haplotype network was constructed using the median-joining method implemented in PopArt (Leigh & Bryant, 2015).
Population structure and spatial genetic patterns revealed by microsatellites
We calculated pairwise FST for each population pair using Arlequin (Excoffier & Lischer, 2010) to measure population diversity and differentiation. To assess the correlation between geographic areas and genetic clusters among individuals, we first applied the Bayesian algorithm in STRUCTURE v2.3.4 (Evanno, Regnaut, & Goudet, 2005). STRUCTURE analysis was conducted with admixture models and independent allele frequencies. Number of groupings K were simulated from 1 to 7 with 10 replicates for each K. A total of 100,000 iterations were simulated, and 25,000 iterations were set as burn-in. Geographic populations (i.e., Northern, Central, Southern) were designated priors. We ran the STRUCTURE analysis twice to check for convergence and compared the results. All results were uploaded to the CLUMPAK server (Kopelman et al., 2015) to generate consensus plots and evaluate the best K according to the highest ΔK value and highest ln(Pr(X|K) (Evanno, Regnaut, & Goudet, 2005).
We also performed Discriminant Analysis of Principal Components (DAPC) for its ability to detect subtle genetic structures without any assumption on sample source (Jombart, Devillard, & Balloux, 2010). The DAPC algorithm combines PCA and DA, partitioning variance into within-group and between-group components to maximize discrimination. We searched for the optimized number of clustering in the analysis of DAPC using the value of BIC. After determining optimized number of clustering on
K = 2 and
K = 5 from STRUCTURE and DAPC, we performed DAPC on 2, 3 and 5 clusters using
the R package adegenet (Jombart, 2008).
We conducted a spatial Principal Component Analysis (sPCA) to determine the spatial distribution of genetic variation using the R package adegenet (Jombart, 2008). We constructed the connection network by linking the nearest 10 samples at each site. We further tested for autocorrelation of allelic frequencies using Moran's I and global and local permutation tests, with 999 replicates each. The eigenvalue of each PC axis and an eigenvalue decomposition plot were illustrated to identify spatially meaningful PC axes.
Evaluation of current gene flow and resistance surfaces
Contemporary gene flow was estimated using BayesAss (Wilson & Rannala, 2003), which estimates directional migration rates over past generations using multilocus genotypes and a Bayesian approach. Mixing parameters were altered as
m = 0.3,
a = 0.4, and
f = 0.5 to tune
acceptance rates between 20% - 40%. Aside from estimating gene flow between populations, we adopted an estimated effective migration surfaces (EEMS) (Petkova, Novembre, & Stephens, 2016) algorithm to visualize geographic regions deviating from isolation by distance (IBD), i.e., to identify areas where genetic similarity decays faster or slower than expected for a given geographic distance (known as “barriers” and “corridors”, respectively). Additional parameters for executing BayesAss and EEMS are presented in the SMM.
Inference of demographic history
DIYABC v2.1.0 (Cornuet et al., 2014) was implemented to investigate demographic history based on Approximated Bayesian Computation (ABC) using nuclear SSR markers. The analysis employed the three geographic groups previously identified. We assessed five potential scenarios of demographic history among the groups that could have produced the current genetic variation (Fig. S2): s1, southward migration; s2, northward migration; s3-4, central origin; and s5, hybridization origin. Repeat number mutation was modelled using a generalized stepwise mutation (GSM) model, and single nucleotide indels (SNI) were allowed. One million simulations were conducted for each scenario. Logistic regression was used to compare posterior probabilities for these scenarios by examining the top 1% of the simulated datasets, based on the similarity of summary statistics to observed data. Priors for all parameters were provided in Table S1. Detailed instructions on DIYABC and model evaluation can be found in the SMM.
We further used VarEff (Nikolic & Chevalet, 2014) to test for changes in effective population sizes over the past 500 years, encompassing the increase in anthropogenic activities in Taiwan (i.e., when a large number of Han Chinese immigrated to Taiwan). We set a generic mutation rate prior to 4.8x10
-5 and adopted a
GSM model with
p = 0.4, as estimated by DIYABC (see Results). Additional parameters for each population are provided in Table S2. After 10,000 burn-in iterations, we ran 10,000 steps with 100 steps per batch and 100 steps between sampled steps to avoid autocorrelation, and a total of 10,000,000 MCMC iterations were employed.
Landscape genetics
We conducted a series of regression-based analyses to test several landscape hypotheses for discovering the association between genetic distance and landscape features (Shirk, Landguth, & Cushman, 2018). We applied two methods to calculate individual-based genetic distance, i.e., the Nei distance (D
Nei) and the Euclidean distance of the first ten axes of a principle component analysis (D
PCA). D
Nei and D
PCA distances were calculated using the R package adegenet. Detailed method can be found in the SMM. We calculated the D
PCA using the R package vegan (Oksanen et al., 2013).
To determine the underlying landscape features shaping the genetic variation of leopard cats in Taiwan, we constructed five resistance layers (
Fig. S3), the R
elevation, R
roughness, R
human density, R
land_cover, and R
road (SMM), which are potential resistance factors for felid species or other carnivores. For example, elevation and topography roughness are barriers to mongoose (Barros et al., 2017); anthropogenic factors such as human density and human-dominant landcover may hinder the movement of wildcats (Hartmann et al., 2013); and a variety of roads are significant barriers to wildcats in Europe (Westekemper et al., 2021).
Developing resistance surfaces is challenging due to the difficulty in assigning specific values to landscape features (Peterman et al., 2019; Zeller, McGarigal, & Whiteley, 2012). To overcome this, genetic data can be used to optimize resistance layers (Peterman et al., 2019). We applied the genetic algorithm (GA) implemented in the R package ResistanceGA (Peterman, 2018) for developing resistance layers (SMM). We optimized each resistance layer using a single surface optimization procedure with the function SS_optim() in the R package ResistanceGA. All continuous and categorical layers were optimized with
commute-time distance calculated in the R package gdistance (van Etten, 2017), equivalent to circuit-theory-based distance. The maximum resistance of each layer was set to 500, resistance values were optimized utilizing Akaike information criterion values (AIC) as criteria, and other default parameters were adopted. Each resistance layer was optimized with two independent runs using D
Nei or D
PCA as a dependent variable to check for convergence. In addition to optimizing a single surface separately, we employed multiple surface optimization using ResistanceGA's wrapper function all_comb() to optimize all possible combinations automatically.
After optimizing single and all possible multiple resistance models, we conducted model comparison using two independent approaches:
the Maximum Likelihood Population Effects mixed effects model (MLPE) (Clarke, Rothery, & Raybould, 2002) and Reciprocal Causal Modeling (RCM) (Cushman et al., 2013), both of which offer high accuracy for selecting proper landscape features affecting genetic differentiation under different conditions (Peterman et al., 2019). For MLPE, model performance was assessed with corrected
Akaike information criterion values (AICc), and models with ΔAICc < 2 are considered to perform equally well. RCM compares the Mantel R of all possible model combinations of all resistance layers to find the best model for describing genetic differentiation. The model of geographic distance was also added as a null model to the computation of the RCM algorithm. Detailed methods for performing model comparison are described in the SMM. We also applied the resist_boot() function in ResistanceGA to conduct a bootstrap analysis on (1) all possible combinations of resistance models and (2) single-surface resistance models only.
Species distribution modelling (sdm) and potential diversity loss to future climate change
To incorporate the influence of climate change and assess the potential habitats under different climate scenarios in the future, we employed a species distribution modelling (sdm) approach to construct niche models and project models to current and future (in 2070) climate conditions. Given the lack of reasonable predictions for road and landcover alterations in the future, we only used 19 bioclimatic variables related to temperature and precipitation, which are suitable for predicting potential distribution areas of wild mammals (Holzmann et al., 2015). The 19 bioclimatic variables were downloaded from WorldClim2 (Fick & Hijmans, 2017), with a resolution of 30 arc-sec (about 1 km2 per cell). The average of three global climate models, i.e., CCSM4, MIROC-ESM, and MIROC5, in 2070 was implemented under two contrasting representative concentration pathways (RCPs), a low-emission model (RCP2.6) and a high-emission model (RCP8.5) from CMIP5 (Taylor, Stouffer, & Meehl, 2012) for prediction. We conducted an ensemble approach that makes predictions based on combining six methods, including maxent, glm, svm, gam, mda, and mlp, for model construction using the R package sdm (Naimi & Araújo, 2016). Details are described in the SMM. Area under the receiver-operator curve (AUC) and true skill statistic (TSS) were used to evaluate model performance.
To simulate potential diversity loss associated with climate change, we calculated genetic diversity indices, including He, allelic richness (AR), and FIS, under current habitat and climate in 2070 for individual samples overlapping with habitat suitability > 0.5 and > 0.75 on the prediction maps, respectively. A Kruskal Wallis test (Hollander, Wolfe, & Chicken, 2013) was performed to determine significant differences in diversity among groups using the R package stats (R Core Team, 2013).
Results
Genetic diversity of microsatellite and mitochondrial markers
We genotyped 128 individuals for microsatellites and sequenced 113 individuals for mitochondrial cytb, including 58 samples at both microsatellite genotyping and cytb sequencing in the analyses (Table S3). All 12 microsatellite loci were polymorphic (mean PIC = 0.51, ranging from 0.11-0.75; Table S4) and demonstrated a low frequency of null alleles (mean = 0.03, ranging from 0-0.09). All genetic statistics were provided in Table S4.
The overall genetic diversity for
cytb was low (Hd = 0.393; π = 0.00036). We detected four
cytb haplotypes, with only 1-2 basepair (bp) differences among haplotypes (
Fig. 2d), including one haplotype (FCN3) that was not detected among our tissue samples. No clear geographic pattern to haplotype occurrences was found, with the two predominant haplotypes (PbTaiC1 and PbTaiC2) displaying comparable proportions across all geographic regions (
Fig. 2d). All neutrality tests, including
Tajimas' D, as well as Fu and Li's F and
D, were non-significant (
p > 0.1), indicating no selection or abrupt changes in population size.