The brain is under a constant barrage of sensory signals. To most reliably and efficiently interpret these myriad inputs, it must make use of different types of congruence across modalities to tie relevant signals together in a process known as multisensory integration (MSI). MSI can facilitate faster detection of stimuli (Van der Burg et al., 2008), enhancement of perception (Sumby & Pollack, 1954; Vroomen & de Gelder, 2000), and resolution of ambiguity (Green & Angelaki, 2010; Parise & Ernst, 2017; van Ee et al., 2009), as well as produce potent illusions (Mcgurk & Macdonald, 1976; Shams et al., 2000; Shipley, 1964). The types of relevant congruence range from basic stimulus features such as spatiotemporal alignment (Slutsky & Recanzone, 2001) to more complex features such as semantic (Iordanescu et al., 2008) and emotional overlap (Jertberg et al., 2019). However, temporal coincidence holds a special place in multisensory perception research, as it has been shown to produce powerful effects of MSI on its own (Van der Burg et al., 2008, 2011; Vroomen & de Gelder, 2004), and many other types of multisensory interactions depend on temporal proximity (Costantini et al., 2016; Munhall et al., 1996; Shams et al., 2000).
Of all the areas MSI affects our daily lives, speech perception may be the most obvious. The integration of visual signals with their auditory counterparts can greatly enhance our understanding of speech (Erber, 1969; Irwin & DiBlasi, 2017; Sumby & Pollack, 1954; Woodhouse et al., 2008) and even produce powerful multisensory illusions (Mcgurk & Macdonald, 1976) when the stimuli are presented sufficiently close in time (Munhall et al., 1996). One can experience this influence by simply attempting to understand a speaker across a noisy room with or without one’s eyes open. It is in such boisterous environments, in which the reliability of relevant auditory signals is compromised by competing inputs, that the most benefit is afforded by the integration of visual information (Erber, 1969; MacLeod & Summerfield, 1987; Sumby & Pollack, 1954). It is notable, then, that it is precisely these circumstances in which autistic individuals struggle most with speech perception (Alcántara et al., 2004; Fadeev et al., 2023; Mamashli et al., 2017; Ruiz Callejo et al., n.d.). Autism is of particular interest to our understanding of this intersection of MSI, speech perception, and temporal processing because it appears to involve differences on some level in all three categories (Feldman et al., 2018; Kwok et al., 2015; Rapin & Dunn, 2003; Sperdin & Schaer, 2016; Zhou et al., 2018). Our understanding of these issues is, conversely, of particular significance to those with autism because of the manner in which they may contribute to broader social and communication differences.
Many individuals with autism demonstrate impairments in speech processing (Kwok et al., 2015; Rapin & Dunn, 2003; Sperdin & Schaer, 2016) as well as attenuated multisensory effects, particularly when young (Feldman et al., 2018). In light of the crucial role temporal dynamics have been shown to play in MSI, and MSI in turn on speech perception, differences in temporal processing may be underlying factors in both of these disparities. It is worth noting that even though auditory and visual information may originate from the same source, these signals never arrive perfectly simultaneously to the brain. Light travels faster than sound, but auditory stimuli have a lower signal transduction latency (Jain et al., 2015; Kemp, 1973), so the brain must be both tolerant and adaptable to varying degrees of asynchrony between sensory streams to allow integration of relevant stimuli. Tolerance to asynchrony is seen in the window of perceived synchrony (WPS), which is the range of stimulus onset asynchronies (SOAs) over which participants are still likely to perceive multisensory signals as simultaneous. Narrowing of this window, which can be seen as a refinement of temporal processing acuity, occurs during typical development (Hillock et al., 2011; Hillock-Dunn & Wallace, 2012; Lewkowicz & Flom, 2014), but is both delayed and diminished among those with autism (de Boer-Schellekens et al., 2013; Foss-Feig et al., 2010; Stevenson et al., 2014). However, recent research challenges the degree to which this applies to autistic adults (Weiland et al., 2022; Zhou et al., 2022).
Adaptability to asynchrony is seen in temporal recalibration (Fujisaki et al., 2004; Vroomen et al., 2004), an effect in which the point of subjective simultaneity (PSS), where participants are most likely to perceive audiovisual inputs as synchronized, shifts according to prior experience. For example, after hearing an auditory stimulus such as a beep leading a visual stimulus such as a flash, a participant will be more likely to perceived a similarly leading beep as simultaneous with a flash (Van der Burg et al., 2013). This effect also extends to more complex speech stimuli (Van der Burg & Goodbourn, 2015). Some studies have found that this rapid temporal recalibration effect is also diminished in those with autism (J. Noel et al., 2017; Turi et al., 2016), although that with the largest adult sample did not (Weiland et al., 2022), again raising questions about the persistence of temporal processing differences.
Together, these differences in MSI and temporal processing have given rise to theories that posit that basic sensory factors may contribute to the higher level social differences seen in autism via their influence on language and communication (Baum et al., 2015; Stevenson et al., 2018). Stevenson et al. (2018) found that autistic children’s WPS width correlates negatively with the degree of audiovisual integration they experience, which in turn correlates positively with recognition of speech in noise. They took this as evidence that MSI mediates an influence of temporal processing acuity on speech perception in autism. Given the evidence of these cascading effects, understanding the relationship between temporal processing differences, MSI, and speech perception is crucial to illuminating the broader autistic behavioral profile.
Most prominent among the paradigms used to investigate MSI with speech stimuli is the McGurk/MacDonald effect (Mcgurk & Macdonald, 1976). This effect occurs when participants are presented with conflicting phonemes (the smallest auditory components of speech) and visemes (their visual counterparts), leading to an illusion in which what is heard is influenced by what is seen. For example, the presentation of a /ba/ phoneme with a /ga/ viseme tends to lead participants to report hearing the phoneme /da/. This phenomenon, dubbed a fusion, is highly dependent upon temporal alignment (Munhall et al., 1996), and has been shown to correlate negatively with the width of the WPS (Stevenson et al., 2012), which is, again, wider on average among autistic individuals. As such, it is unsurprising that autistic individuals have shown attenuated susceptibility to the McGurk/MacDonald illusion, at least as children.
In a meta-analysis focusing on the McGurk/MacDonald illusion (Zhang et al., 2019), it was found not only that autistic individuals show less susceptibility to the effect, but also that the magnitude of this between group difference increases with age. This led the authors to conclude that while non-autistic people continue to develop in their ability to integrate audiovisual speech stimuli, autistic individuals’ progress may be hampered by heightened attention to local details and reduced orientation to social information. However, it bears noting that 8/9 studies included in their meta-analysis had child samples, and that the only adult study found no difference between groups in the strength of the McGurk/MacDonald effect (Saalasti et al., 2012). Additionally, two studies not included in the meta-analysis (Keane et al., 2010; Stevenson et al., 2018) did not find a difference between groups in susceptibility to the illusion. Notably, this includes the study with the largest previous sample size (Stevenson et al., 2018) and one of the few with an adult sample (Keane et al., 2010). Such inconsistencies raise questions about the degree to which MSI findings with autistic children extend to adults. These are highlighted by findings that autistic children may catch up to their non-autistic peers in their ability to integrate audiovisual speech signals embedded in noise by early adolescence (Foxe et al., 2015). Theories that posit that it is persistent MSI deficits that drive difficulties with speech perception and other higher order differences between autistic and non-autistic adults are challenged by these findings. Beyond theory, because multisensory training has been shown to be highly effective (Nava et al., 2020; O’Brien et al., 2023; Setti et al., 2014), understanding the ages at which these differences exist is essential to tailoring therapeutic interventions for autistic individuals.
In addition to age, a significant factor in the heterogeneity of findings may be sample size. In a review of McGurk/MacDonald studies, Magnotti & Beauchamp (2018) demonstrated that a publication bias towards significant results would produce a vast overestimation of real population differences given the small sample sizes conventional in this field of research. This led them to conclude that the published estimates of the differences between groups in MSI measured using the McGurk/MacDonald effect are inflated. They argued that to alleviate this effect size inflation and enhance replicability, sample sizes must be increased considerably.
In order to examine the degree to which findings from previous studies with children, limited in both scope and age range, extend to autistic adults, we recruited the largest sample to date for a study investigating differences between autistic and non-autistic adults in temporal processing and audiovisual integration of speech stimuli. We measured these using a version of the McGurk/MacDonald task involving manipulation of SOA and both syllable and simultaneity judgments. This allowed us to compare the rate at which the illusion occurs as well as the likelihood for participants to perceive stimuli as synchronized, their WPS, and the effects of rapid temporal recalibration. We predicted diminished susceptibility to the McGurk/MacDonald effect, blunted temporal acuity (i.e. a wider WPS), and an attenuated effect of temporal recalibration in autistic versus non-autistic participants.