P15Session 1 (Thursday 11 January 2024, 15:35-18:00)Listening effort ratings for habitual and clear-Lombard speech in noise as predicted by a glimpse (release from energetic masking) measure
Earlier research developed the high-energy glimpse proportion metric (HEGP, Tang & Cooke, 2016, doi:10.21437/Interspeech.2016-14) to capture those speech-dominant spectro-temporal regions, or glimpses, that survive energetic masking when speech is presented in noise. Though the HEGP metric has been shown to correlate with intelligibility across speech types and signal-to-noise ratios, its connection to subjective listening effort ratings remains, to our knowledge, unexplored. Given the inverse relationship between intelligibility and subjective effort ratings (e.g., Simantiraki et al., 2023, doi:10.3389/fnins.2023.1235911), HEGP scores can be expected to predict subjective effort ratings.
Following up on our earlier acoustic analysis of speech from multiple talkers, we collected listening effort ratings for the same speech samples through an online experiment. Specifically, ratings were collected from 230 young adult normal-hearing raters for the speech of 48 talkers in two speaking styles (habitual and clear-Lombard), presented in speech-shaped noise at an SNR of -6. Raters rated their listening effort to understand the sentence content of the utterance on a scale from 1 to 7 (1 representing ‘not effortful at all to understand’ and 7 ‘extremely effortful to understand’). For all utterances, HEGP was calculated for presentation in speech-shaped noise at -6 SNR. Sentence-level acoustic measures (articulation rate, F0 median and range, and spectral balance) for these utterances were also available.
Based on the obtained ratings, we ask the following research questions. RQ1: Does HEGP predict listening effort ratings, and if so, does it do so differentially for habitual and clear-Lombard speech? RQ2: If HEGP predicts listening effort ratings, do sentence-level acoustic measures (i.e., articulation rate, pitch range, and spectral balance) explain additional variance in listening effort ratings?
To address these questions, we set up an initial linear-mixed effects model predicting listening effort ratings from speaking style and HEGP, and their interaction. In a second model, we added the four acoustic measures as predictors to the initial model to investigate whether inclusion of these further improved model fit. In both models, we included random intercepts for Talker, Rater, and Sentence, as well as by-talker random intercepts for speaking style to account for talker differences in the size of their speaking-style difference.
Concerning RQ1, results from the initial and second model showed that HEGP predicted listening effort rating equally strongly across speaking styles. Concerning RQ2, the second model proved to have a better model fit than the initial model, with all acoustic measures, except F0 median, predicting effort ratings. These results suggest that release-from masking metrics, complemented by acoustic measures that partly reflect talkers’ clear-Lombard speech adjustments, explain subjective listening effort in noise.