Data extraction
We collected data on author, year, study design, sample size, method of
assessing LGA, intervention (induction, spontaneous, caesarean delivery
or not specified), age at follow-up, cognitive or academic outcome, and
confounders. The primary data extraction was performed by one reviewer
(X.Z.) and checked for accuracy by a second reviewer (M.S.). Any
disagreements were fully discussed until a consensus was reached.
Corresponding authors of included papers were contacted by email to
provide further details if data were insufficient or missing.
To perform meta-analyses, for continuous variables, we extracted the
mean, standard deviation (SD), and total sample size (N), or mean
difference, lower/upper limit, and total N, for the exposed and control
groups in the cognitive assessment scores. For the dichotomous
variables, we extracted the 2*2 table or Odds Ratio and lower/upper
limit.
There were two types of reference groups for comparison of early-term
infants; one type of study compared early-term infants (37-38 weeks)
with full-term infants (39-41 weeks), in which case we used full-term
infants (39-41 weeks) as the reference group. The second of studies
showed results for 37, 38, 39, 40, and 41 weeks GA separately, in which
case we used 40 weeks as the reference group to examine 37w vs 40w and
38w vs 40w GA.
Any measure of cognitive function was considered for inclusion. When
results were reported as both an overall test score (e.g. Intelligence
Quotient; IQ) and a domain-specific score (e.g. receptive vocabulary
delay), we chose the overall one in data synthesis. When results were
only reported as domain-specific scores within the same study
population, we calculated the mean score across domain-specific tests.
Where multiple cognitive or academic outcomes were reported, we selected
the one that provided the most reliable information for analysis (e.g.
IQ test vs. school grade). Studies with follow-up of at least 6 months
were eligible. When the outcomes were measured more than once at
different ages for the same study population, we selected the oldest age
group with the most reliable cognitive assessment. If multiple
multivariable models were reported, we extracted data from the model
with the most confounder-adjusted model (e.g. adjusted by education and
sex vs. adjusted by sex).
We extracted data according to three primary outcomes as follows.
Cognitive outcomes were based on cognitive scores (e.g., Bayley Scale of
Infant and Toddler Development Mental Developmental
Index,23-27 and Wechsler Abbreviated Scale of
Intelligence,28) or cognitive impairment (e.g.,
Wechsler Intelligence Scale for Children-full scale IQ below average
defined as scores below 85 or one standard deviation below the
mean29). Academic outcomes were based on low academic
performance (e.g. special education needs defined as children in
Scottish schools 2005 census requires special education provision, which
comprises both children with learning disabilities, such as dyslexia and
dyspraxia, and children with physical disabilities that affect
learning30). See Appendix S2 for full details of
outcome definitions.
To allow comparability of primary outcomes harmonization was required
using the extracted data: (a) If the study reported a cognitive test T
score, percentile or Z score, we converted it into intelligence quotient
(IQ with mean: 100; SD: 15); (b) if the direction of a study’s outcome
was inconsistent with others (e.g., receiving a longer education rather
than shorter), we converted it to a same-direction outcome; (c) if an
LGA-related study defined LGA not in terms of percentiles but in terms
of SD or absolute values, we converted it to percentiles using the World
Health Organization foetal growth calculator (unknown foetal
sex).31