Analysis
The electropherograms were analysed with Genemapper V4.0 (Applied
Biosystems) with the same peak calling strategy as described previously
(Jennison et al., 2015; Schultz et al., 2010). To avoid artefacts,
precautions were taken to ensure allele calling was consistent (Jennison
et al., 2015), and carefully reconstructing dominant haplotypes as per
previously described methods (Anderson, Su, Bockarie, Lagog, & Day,
1999; Jennison et al., 2015; Schultz et al., 2010). Briefly, this
involved setting the minimum fluorescence to 500 Random Fluorescence
Units (RFU) for all colours except the size standard. Stutter window was
set to 3.5 for 3bp repeats and 4.5 for 4bp repeats. The stutter ratio
was set to 0.4 for all markers. The stutter detection was only applied
to shorter alleles, with longer alleles within the stutter window
subject to the standard 30% cut-off threshold. Samples with low
fluorescence were manually reanalysed with a minimum fluorescence of 100
RFU. For the Madang 2005 and Wosera 2006 P. falciparum data,
previously published cleaned and rounded microsatellite allele repeat
numbers for P. falciparum single clone infections (Schultz et
al., 2010) were converted back to allele sizes using the known number of
nucleotides/repeat, whereas for P. vivax the raw data (allele
calls) was available (Jennison et al., 2015). These data were combined
with the newly generated MS data from the other studies before binning
the alleles using the TANDEM software (Matschiner & Salzburger, 2009).
Allele frequencies of the entire dataset (incl. previously genotyped
datasets) were investigated and outlying alleles (most likely caused by
PCR artefacts) were removed. Samples with missing data at six (60%) or
more MS loci were excluded from further analysis. We attempted to
calibrate the P. falciparum data from pre-LLIN Madang 2006 and
Wosera (ESP1 2005) by converting rounded repeat numbers back to allele
sizes, binning together with the newly generated data and removing
outliers. However, there was strong population structure when compared
to the new dataset, indicating experimental differences despite the use
of the same protocols. Thus, we excluded direct comparisons between old
and new datasets for P. falciparum .
To conduct the population genetic analyses, allele frequencies and input
files for the various population genetics programs were created using
CONVERT version 1.31 (Glaubitz) and PGD Spider version 2.1.0.1 (Lischer
& Excoffier, 2012). Genetic diversity was measured by calculating the
number of alleles (A ), Nei’s gene diversity (expected
heterozygosity (He ) (Nei, 1987)) and allelic richness (Rs )
(Hurlbert, 1971) that corrects for sample size, using FSTAT version
2.9.3.2 (Goudet, 1995). Pairwise genetic differentiation was measured by
calculating pairwise Jost’s D (Jost, 2008) and Weir and
Cockerhams F ST (Weir & Cockerham, 1984) values
and 95% confidence intervals were estimated with 1000 bootstraps using
the diveRsity package in R (Keenan, McGinnity, Cross, Crozier, &
Prodöhl, 2013). In contrast to some earlier studies (Schultz et al.,
2010), where haploid genotypes were coded as diploid genotypes, but
homozygote at each locus, the data in this study was analysed using
haploid datasets (as in Jennison et al . (Jennison et al., 2015)).
As a measure of inbreeding in each population, multilocus linkage
disequilibrium (mLD) was calculated using LIAN version 3.7, applying a
Monte Carlo test with 100,000 re-sampling steps (Haubold & Hudson,
2000). In the LIAN analysis only samples with complete haplotypes were
included. For both species the dominant and single haplotypes were
compared within catchments to identify any significant linkage
disequilibrium (mLD) using LIAN (Haubold & Hudson, 2000) andFST using FSTAT version 2.9.3.2 (Goudet, 1995).
Bottleneck (Piry, Luikart, & Cornuet) was used to test for an excess of
heterozygosity brought about by the loss of rare alleles following a
population bottleneck. A two-phase mutational (TPM) model of 70%
stepwise and 30% non-stepwise mutations and run 1000 iterations was
used. In addition, the allele frequency distribution for all loci was
examined for a ‘mode shift’ in the distribution (Luikart, Allendorf,
Cornuet, & Sherwin, 1998). To investigate parasite population genetic
structure, the Bayesian clustering software, STRUCTURE version 2.3.4
(Pritchard, Stephens, & Donnelly, 2000) was used to investigate whether
haplotypes for each species clustered according to geographical origin
and/or within time periods. The analysis was run 20 times for K = 1 to
15 for 100,000 Monte Carlo Markov Chain (MCMC) iterations after a
burn-in period of 10,000 using the admixture model and correlated allele
frequencies. The second order rate of change of LnP[D], ΔK was
calculated according to the method of Evanno et al. (Evanno, Regnaut, &
Goudet, 2005) to determine the most likely K (most likely number of
populations). CLUMPP version 1.1.2 (Jakobsson & Rosenberg, 2007) was
used to facilitate the interpretation of population genetic results
using files generated with STRUCTURE HARVESTER Web v0.6.94 (Earl &
vonHoldt, 2011) and Distruct 1.1 (Rosenberg, 2004) was used to
visualize the structure plots with the data generated with CLUMPP.
Statistical analysis of molecular epidemiological and population genetic
parameters was done using non-parametric methods as indicated in the
results-section using STATA v12.1 (StataCorp, USA). QGIS 2.18.24
(OpenSource Geospatial Foundation) was used to map the villages and
households and maps were constructed with spatial layers from DIVA-GIS
(”Diva GIS country level data,”).