Analysis
The electropherograms were analysed with Genemapper V4.0 (Applied Biosystems) with the same peak calling strategy as described previously (Jennison et al., 2015; Schultz et al., 2010). To avoid artefacts, precautions were taken to ensure allele calling was consistent (Jennison et al., 2015), and carefully reconstructing dominant haplotypes as per previously described methods (Anderson, Su, Bockarie, Lagog, & Day, 1999; Jennison et al., 2015; Schultz et al., 2010). Briefly, this involved setting the minimum fluorescence to 500 Random Fluorescence Units (RFU) for all colours except the size standard. Stutter window was set to 3.5 for 3bp repeats and 4.5 for 4bp repeats. The stutter ratio was set to 0.4 for all markers. The stutter detection was only applied to shorter alleles, with longer alleles within the stutter window subject to the standard 30% cut-off threshold. Samples with low fluorescence were manually reanalysed with a minimum fluorescence of 100 RFU. For the Madang 2005 and Wosera 2006 P. falciparum data, previously published cleaned and rounded microsatellite allele repeat numbers for P. falciparum single clone infections (Schultz et al., 2010) were converted back to allele sizes using the known number of nucleotides/repeat, whereas for P. vivax the raw data (allele calls) was available (Jennison et al., 2015). These data were combined with the newly generated MS data from the other studies before binning the alleles using the TANDEM software (Matschiner & Salzburger, 2009). Allele frequencies of the entire dataset (incl. previously genotyped datasets) were investigated and outlying alleles (most likely caused by PCR artefacts) were removed. Samples with missing data at six (60%) or more MS loci were excluded from further analysis. We attempted to calibrate the P. falciparum data from pre-LLIN Madang 2006 and Wosera (ESP1 2005) by converting rounded repeat numbers back to allele sizes, binning together with the newly generated data and removing outliers. However, there was strong population structure when compared to the new dataset, indicating experimental differences despite the use of the same protocols. Thus, we excluded direct comparisons between old and new datasets for P. falciparum .
To conduct the population genetic analyses, allele frequencies and input files for the various population genetics programs were created using CONVERT version 1.31 (Glaubitz) and PGD Spider version 2.1.0.1 (Lischer & Excoffier, 2012). Genetic diversity was measured by calculating the number of alleles (A ), Nei’s gene diversity (expected heterozygosity (He ) (Nei, 1987)) and allelic richness (Rs ) (Hurlbert, 1971) that corrects for sample size, using FSTAT version 2.9.3.2 (Goudet, 1995). Pairwise genetic differentiation was measured by calculating pairwise Jost’s D (Jost, 2008) and Weir and Cockerhams F ST (Weir & Cockerham, 1984) values and 95% confidence intervals were estimated with 1000 bootstraps using the diveRsity package in R (Keenan, McGinnity, Cross, Crozier, & Prodöhl, 2013). In contrast to some earlier studies (Schultz et al., 2010), where haploid genotypes were coded as diploid genotypes, but homozygote at each locus, the data in this study was analysed using haploid datasets (as in Jennison et al . (Jennison et al., 2015)). As a measure of inbreeding in each population, multilocus linkage disequilibrium (mLD) was calculated using LIAN version 3.7, applying a Monte Carlo test with 100,000 re-sampling steps (Haubold & Hudson, 2000). In the LIAN analysis only samples with complete haplotypes were included. For both species the dominant and single haplotypes were compared within catchments to identify any significant linkage disequilibrium (mLD) using LIAN (Haubold & Hudson, 2000) andFST using FSTAT version 2.9.3.2 (Goudet, 1995).
Bottleneck (Piry, Luikart, & Cornuet) was used to test for an excess of heterozygosity brought about by the loss of rare alleles following a population bottleneck. A two-phase mutational (TPM) model of 70% stepwise and 30% non-stepwise mutations and run 1000 iterations was used. In addition, the allele frequency distribution for all loci was examined for a ‘mode shift’ in the distribution (Luikart, Allendorf, Cornuet, & Sherwin, 1998). To investigate parasite population genetic structure, the Bayesian clustering software, STRUCTURE version 2.3.4 (Pritchard, Stephens, & Donnelly, 2000) was used to investigate whether haplotypes for each species clustered according to geographical origin and/or within time periods. The analysis was run 20 times for K = 1 to 15 for 100,000 Monte Carlo Markov Chain (MCMC) iterations after a burn-in period of 10,000 using the admixture model and correlated allele frequencies. The second order rate of change of LnP[D], ΔK was calculated according to the method of Evanno et al. (Evanno, Regnaut, & Goudet, 2005) to determine the most likely K (most likely number of populations). CLUMPP version 1.1.2 (Jakobsson & Rosenberg, 2007) was used to facilitate the interpretation of population genetic results using files generated with STRUCTURE HARVESTER Web v0.6.94 (Earl & vonHoldt, 2011) and Distruct 1.1 (Rosenberg, 2004) was used to visualize the structure plots with the data generated with CLUMPP. Statistical analysis of molecular epidemiological and population genetic parameters was done using non-parametric methods as indicated in the results-section using STATA v12.1 (StataCorp, USA). QGIS 2.18.24 (OpenSource Geospatial Foundation) was used to map the villages and households and maps were constructed with spatial layers from DIVA-GIS (”Diva GIS country level data,”).