The brain is under a constant barrage of sensory signals. To most
reliably and efficiently interpret these myriad inputs, it must make use
of different types of congruence across modalities to tie relevant
signals together in a process known as multisensory integration (MSI).
MSI can facilitate faster detection of stimuli (Van der Burg et al.,
2008), enhancement of perception (Sumby & Pollack, 1954; Vroomen & de
Gelder, 2000), and resolution of ambiguity (Green & Angelaki, 2010;
Parise & Ernst, 2017; van Ee et al., 2009), as well as produce potent
illusions (Mcgurk & Macdonald, 1976; Shams et al., 2000; Shipley,
1964). The types of relevant congruence range from basic stimulus
features such as spatiotemporal alignment (Slutsky & Recanzone, 2001)
to more complex features such as semantic (Iordanescu et al., 2008) and
emotional overlap (Jertberg et al., 2019). However, temporal coincidence
holds a special place in multisensory perception research, as it has
been shown to produce powerful effects of MSI on its own (Van der Burg
et al., 2008, 2011; Vroomen & de Gelder, 2004), and many other types of
multisensory interactions depend on temporal proximity (Costantini et
al., 2016; Munhall et al., 1996; Shams et al., 2000).
Of all the areas MSI affects our daily lives, speech perception may be
the most obvious. The integration of visual signals with their auditory
counterparts can greatly enhance our understanding of speech (Erber,
1969; Irwin & DiBlasi, 2017; Sumby & Pollack, 1954; Woodhouse et al.,
2008) and even produce powerful multisensory illusions (Mcgurk &
Macdonald, 1976) when the stimuli are presented sufficiently close in
time (Munhall et al., 1996). One can experience this influence by simply
attempting to understand a speaker across a noisy room with or without
one’s eyes open. It is in such boisterous environments, in which the
reliability of relevant auditory signals is compromised by competing
inputs, that the most benefit is afforded by the integration of visual
information (Erber, 1969; MacLeod & Summerfield, 1987; Sumby &
Pollack, 1954). It is notable, then, that it is precisely these
circumstances in which autistic individuals struggle most with speech
perception (Alcántara et al., 2004; Fadeev et al., 2023; Mamashli et
al., 2017; Ruiz Callejo et al., n.d.). Autism is of particular interest
to our understanding of this intersection of MSI, speech perception, and
temporal processing because it appears to involve differences on some
level in all three categories (Feldman et al., 2018; Kwok et al., 2015;
Rapin & Dunn, 2003; Sperdin & Schaer, 2016; Zhou et al., 2018). Our
understanding of these issues is, conversely, of particular significance
to those with autism because of the manner in which they may contribute
to broader social and communication differences.
Many individuals with autism demonstrate impairments in speech
processing (Kwok et al., 2015; Rapin & Dunn, 2003; Sperdin & Schaer,
2016) as well as attenuated multisensory effects, particularly when
young (Feldman et al., 2018). In light of the crucial role temporal
dynamics have been shown to play in MSI, and MSI in turn on speech
perception, differences in temporal processing may be underlying factors
in both of these disparities. It is worth noting that even though
auditory and visual information may originate from the same source,
these signals never arrive perfectly simultaneously to the brain. Light
travels faster than sound, but auditory stimuli have a lower signal
transduction latency (Jain et al., 2015; Kemp, 1973), so the brain must
be both tolerant and adaptable to varying degrees of asynchrony between
sensory streams to allow integration of relevant stimuli. Tolerance to
asynchrony is seen in the window of perceived synchrony (WPS), which is
the range of stimulus onset asynchronies (SOAs) over which participants
are still likely to perceive multisensory signals as simultaneous.
Narrowing of this window, which can be seen as a refinement of temporal
processing acuity, occurs during typical development (Hillock et al.,
2011; Hillock-Dunn & Wallace, 2012; Lewkowicz & Flom, 2014), but is
both delayed and diminished among those with autism (de Boer-Schellekens
et al., 2013; Foss-Feig et al., 2010; Stevenson et al., 2014). However,
recent research challenges the degree to which this applies to autistic
adults (Weiland et al., 2022; Zhou et al., 2022).
Adaptability to asynchrony is seen in temporal recalibration (Fujisaki
et al., 2004; Vroomen et al., 2004), an effect in which the point of
subjective simultaneity (PSS), where participants are most likely to
perceive audiovisual inputs as synchronized, shifts according to prior
experience. For example, after hearing an auditory stimulus such as a
beep leading a visual stimulus such as a flash, a participant will be
more likely to perceived a similarly leading beep as simultaneous with a
flash (Van der Burg et al., 2013). This effect also extends to more
complex speech stimuli (Van der Burg & Goodbourn, 2015). Some studies
have found that this rapid temporal recalibration effect is also
diminished in those with autism (J. Noel et al., 2017; Turi et al.,
2016), although that with the largest adult sample did not (Weiland et
al., 2022), again raising questions about the persistence of temporal
processing differences.
Together, these differences in MSI and temporal processing have given
rise to theories that posit that basic sensory factors may contribute to
the higher level social differences seen in autism via their influence
on language and communication (Baum et al., 2015; Stevenson et al.,
2018). Stevenson et al. (2018) found that autistic children’s WPS width
correlates negatively with the degree of audiovisual integration they
experience, which in turn correlates positively with recognition of
speech in noise. They took this as evidence that MSI mediates an
influence of temporal processing acuity on speech perception in autism.
Given the evidence of these cascading effects, understanding the
relationship between temporal processing differences, MSI, and speech
perception is crucial to illuminating the broader autistic behavioral
profile.
Most prominent among the paradigms used to investigate MSI with speech
stimuli is the McGurk/MacDonald effect (Mcgurk & Macdonald, 1976). This
effect occurs when participants are presented with conflicting phonemes
(the smallest auditory components of speech) and visemes (their visual
counterparts), leading to an illusion in which what is heard is
influenced by what is seen. For example, the presentation of a /ba/
phoneme with a /ga/ viseme tends to lead participants to report hearing
the phoneme /da/. This phenomenon, dubbed a fusion, is highly dependent
upon temporal alignment (Munhall et al., 1996), and has been shown to
correlate negatively with the width of the WPS (Stevenson et al., 2012),
which is, again, wider on average among autistic individuals. As such,
it is unsurprising that autistic individuals have shown attenuated
susceptibility to the McGurk/MacDonald illusion, at least as children.
In a meta-analysis focusing on the McGurk/MacDonald illusion (Zhang et
al., 2019), it was found not only that autistic individuals show less
susceptibility to the effect, but also that the magnitude of this
between group difference increases with age. This led the authors to
conclude that while non-autistic people continue to develop in their
ability to integrate audiovisual speech stimuli, autistic individuals’
progress may be hampered by heightened attention to local details and
reduced orientation to social information. However, it bears noting that
8/9 studies included in their meta-analysis had child samples, and that
the only adult study found no difference between groups in the strength
of the McGurk/MacDonald effect (Saalasti et al., 2012). Additionally,
two studies not included in the meta-analysis (Keane et al., 2010;
Stevenson et al., 2018) did not find a difference between groups in
susceptibility to the illusion. Notably, this includes the study with
the largest previous sample size (Stevenson et al., 2018) and one of the
few with an adult sample (Keane et al., 2010). Such inconsistencies
raise questions about the degree to which MSI findings with autistic
children extend to adults. These are highlighted by findings that
autistic children may catch up to their non-autistic peers in their
ability to integrate audiovisual speech signals embedded in noise by
early adolescence (Foxe et al., 2015). Theories that posit that it is
persistent MSI deficits that drive difficulties with speech perception
and other higher order differences between autistic and non-autistic
adults are challenged by these findings. Beyond theory, because
multisensory training has been shown to be highly effective (Nava et
al., 2020; O’Brien et al., 2023; Setti et al., 2014), understanding the
ages at which these differences exist is essential to tailoring
therapeutic interventions for autistic individuals.
In addition to age, a significant factor in the heterogeneity of
findings may be sample size. In a review of McGurk/MacDonald studies,
Magnotti & Beauchamp (2018) demonstrated that a publication bias
towards significant results would produce a vast overestimation of real
population differences given the small sample sizes conventional in this
field of research. This led them to conclude that the published
estimates of the differences between groups in MSI measured using the
McGurk/MacDonald effect are inflated. They argued that to alleviate this
effect size inflation and enhance replicability, sample sizes must be
increased considerably.
In order to examine the degree to which findings from previous studies
with children, limited in both scope and age range, extend to autistic
adults, we recruited the largest sample to date for a study
investigating differences between autistic and non-autistic adults in
temporal processing and audiovisual integration of speech stimuli. We
measured these using a version of the McGurk/MacDonald task involving
manipulation of SOA and both syllable and simultaneity judgments. This
allowed us to compare the rate at which the illusion occurs as well as
the likelihood for participants to perceive stimuli as synchronized,
their WPS, and the effects of rapid temporal recalibration. We predicted
diminished susceptibility to the McGurk/MacDonald effect, blunted
temporal acuity (i.e. a wider WPS), and an attenuated effect of temporal
recalibration in autistic versus non-autistic participants.