Pi-SPIN: Paraphrase to improve Speech Perception in Noise
Considering the increasingly wide application of spoken dialog systems (SDS) in real-world noisy environments such as navigation and medical assistance, synthesizing noise-robust and better intelligible utterances has become a pressing priority. Nevertheless, current SDS are less adaptive to the listening difficulties of their interlocutors, in contrast to human speakers. Much research in the past has focused on modulating acoustic features like imitating the Lombard Speech to synthesize noise-robust speech.
In this presentation, I will first focus on our proposed strategy — replace a sentence with its better-intelligible paraphrase — to improve speech perception in noise. A new dataset called Paraphrases-in-Noise (PiN) was created by collecting the human perception data of sentential paraphrases in noisy environments. Our experimental results demonstrate that the choice of linguistic forms to represent a message introduces a significant difference in intelligibility among sentential paraphrases, in noise. In a highly noisy environment like babble noise at SNR -5 dB, replacing utterances with sentential paraphrases that have better acoustic cues resulted in an overall intelligibility gain of 33%. In the second part of the talk, I will present two novel approaches that we designed to synthesize noise-robust speech — (1) an intelligibility-aware paraphrase ranking model, and (2) a paraphrase generation model that optimizes for better intelligibility. To encourage further explorations on the mitigation of human mishearing in noise, we released the PiN dataset.
Acknowledgments: This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project ID 232722074 – SFB 1102.