15th Speech in Noise Workshop, 11-12 January 2024, Potsdam, Germany 15th Speech in Noise Workshop, 11-12 January 2024, Potsdam, Germany

P16Session 2 (Friday 12 January 2024, 09:00-11:30)
Speech recognition with different target and masker voices in a speech-on-speech masking task in normal hearing listeners and cochlear implant users

Verena Müller, Emeline Cordary, Pauline Burkhardt, Ruth Lang-Roth
University of Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Cochlear Implant Center, Faculty of Medicine, Germany

Objective: Speech recognition in a competing talker situation is extremely challenging. While normal hearing (NH) listeners can separate two competing talkers by their voices, this is hardly possible for cochlear implant (CI) users. This is partly due to the CI's limited processing of two important voice cues, the fundamental frequency (F0) and the formant frequencies (Fn). The aim of the study was to determine in how far speech recognition in a competing talker situation is influenced by the target’s and the masker’s voice and if speech recognition alters depending on which voice is the target and which is the masker.

Methods: 15 adult listeners with NH and 16 listeners with a CI participated in the study. The German Oldenburg sentence test, a matrix sentence test, was used as the test material. The sentences have the structure “name-verb-number-adjective-objective”. The original male voice was manipulated regarding its F0 and its Fn, to create a female and a child-like voice. Always two sentences were superimposed, with the three voices acting as both, target and masker talkers. Additionally, all three voices were presented against modulated speech-shaped noise. Listeners had the task to repeat the sentence which began with the name “Stefan”, a common German first name. The target-to-masker ratio (TMR) in dB at which listeners understood 50% was measured. Target and masker stimuli were both presented from the front.

Results: NH listeners’ speech recognition was worst when the talkers with the same voice were superimposed, followed by the condition when voices were presented against the noise maskers. Speech recognition was best when competing talkers differed in voices. CI users’ speech recognition was worst in the competing talker conditions and best when presented against the noise maskers. Within the competing talker conditions it seems that the child’s voice was a more efficient masker compared to the male and the female voice.

Conclusions: For NH listeners results reveal that it is not the voice that matters, but whether there is a difference between the voices to influence speech recognition. This is different for CI users whose results showed that a voice which is higher in frequency might be a more efficient masker, and a better trackable target respectively, than a voice which is lower in frequency.

Last modified 2024-01-16 10:49:05