P. vivax genomic data summary
Based on a literature search including manuscripts published before October 2022, we identified 1311 high-quality publicly shared P. vivax genomes. Raw sequencing data were downloaded and all genomes were combined, including in-house sequenced P. vivax genomes (n=163 samples originating from Peru, Brazil, Vietnam, and imported cases in Belgium from travellers and migrants .
A total of 1474 high-quality P. vivax genomes (Supplementary Table 1), coming from 36 countries in Asia (n=878), Americas (n=399), and Africa (n=197), and collected between 2000 and 2019, were retained after removing samples with less than 50% of the genome covered at least 5-fold (Figure 1). The median sequencing coverage over the PvP01 reference genome including only retained isolates was 26-fold (range 1-763). After alignment and variant calling, a total of 2,435,842 high quality genetic variants were identified (1,983,976 SNPs and 451,866 Indels), with a total of 1,836,935 variants in the core genome region, (1,477,945 SNPs and 358,990 indels).
To facilitate the analysis, included genomes were grouped in regional populations (following classifications from : Africa (AFR, including isolates from all countries in sub-Saharan Africa, and returning travellers with history of travel to these countries), Eastern South East Asia (ESEA, including isolates from Cambodia, Laos, Thailand, Vietnam, and the China-Myanmar border), Latin America (LAM, which includes isolates from Mexico, Central and South America), Middle South East Asia (MSEA, including isolates from Malaysia and The Philippines), Oceania (OCE, including isolates from the island of New Guinea (i.e. Papua New Guinea and part of Indonesia)), Western Asia (WAS, which includes Afghanistan, Bangladesh, India, Iran, Pakistan, and Sri Lanka).