P43Session 1 (Thursday 11 January 2024, 15:35-18:00)Rapid label-referent mapping with vocoded speech in young infants
The speech amplitude envelope plays a central role in speech perception and acquisition. The human auditory cortex tracks the speech amplitude envelope from birth. Envelope tracking correlates with speech perception in infants and speech comprehension in adults. The relative importance of the speech envelope for speech comprehension can be shown with vocoded speech – synthesized speech that simulates sound perception with a cochlear implant by dividing the speech signal into narrow frequency bands, then extracting their envelopes that are used to modulate noise in the same frequency band (i.e. channels). Adult listeners understand vocoded speech synthesized from as few as 3 channels and children from 4 or 8. While recent studies suggest that even young infants can discriminate vocoded speech during the first months of life, it remains unknown if young infants perceive vocoded speech as language or can acquire language from vocoded stimuli.
To answer this question 7- to 9-mo German-learning infants (N=36) participated in a label-referent mapping experiment. While infants listened to short trials (N=20) consisting of a familiarization and a test phase we measured their pupil size. In each trial, infants were first briefly familiarized with 2 object-label pairs and then presented with 4 test events: 2 Same trials (containing one of the familiarization object-label pairs) and 2 Switch trials (where the familiarization objects and labels were switched). The visual stimuli were 8 abstract Tetris-like objects. The auditory stimuli were 8 disyllabic nonce words uttered by 5 female German native speakers in Infant Directed Speech (4 speakers for familiarization and 1 for test events). The auditory stimuli were either natural speech or vocoded stimuli synthesized from 2, 4, 8, or 16 narrowband frequencies corresponding to the frequency bands of the human cochlea.
A cluster-based permutation test over the pupillary response at test revealed a significant interaction between Channel (Speech/2ch/4ch/8ch/16ch) and Trial Type (Same/Switch) in a window that started 1886 ms after test word onset and lasted for 789 ms (TSUM=64.42, P<.01). In this window we observed a significant difference between Same and Switch trials with natural speech and 16-channel vocoded speech, but not with vocoded speech composed of fewer channels. Our results show that infants can rapidly map labels to visual objects after only limited exposure to natural as well as vocoded speech. However, this ability to use speech that only contains envelope cues as labels to novel objects deteriorates quickly as the number of frequency bands is reduced. This suggests that young infants do not perceive vocoded speech as readily as human adults and that stimulating only a few cochlear channels in infants may not provide sufficient auditory detail for perceiving spoken language.