P21Session 1 (Thursday 11 January 2024, 15:35-18:00)Evaluation of chosen DNN-based speech enhancement methods using Polish matrix speech intelligibility test
Most of DNN-based speech enhancement algorithms are evaluated only using objective metrics (e.g. STOI or PESQ), or subjective quality assessment (as in deep noise suppression, DNS, challenge; Dubey et al., doi:10.48550/arXiv.2303.11510). Although previous studies showed the benefits that come from the DNN-based speech enhancement, it is not known how strong the effect coming from their use would be observed in standard speech audiometry tests.
In this work, the Polish sentence matrix test (Ozimek, et al., doi:10.3109/14992021003681030) was used to assess the chosen recent DNN-based speech enhancement methods. Publicly available pre-trained neural networks (Conv-TasNet, deep complex U-net, dual-path transformer network) were evaluated. The tests determined speech reception thresholds (SRTs), i.e. the SNRs of the speech mixed with babble noise, that after speech enhancement, give the intelligibility of 50%. The listening tests were completed by 20 participants with normal hearing. No fine-tuning of the tested neural networks to the specific conditions (language, speaker and noise) was done.
The algorithms were compared with respect to the measured speech reception thresholds and their computational and memory requirements. Additionally, a signal analysis was performed to compare the types of artifacts produced by the tested neural networks and objective speech intelligibility measures: STOI and HASPI.