Back To Schedule
Saturday, October 23 • 1:00pm - 1:30pm
Best Student Paper: InSE-NET: A Perceptually Coded Audio Quality Model based on CNN

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. We propose a learnable neural network, entitled InSE-NET, with a backbone of Inception and Squeeze-and-Excitation modules to assess the perceived quality of
coded audio at a 48 kHz sample rate. We demonstrate that synthetic data augmentation is capable of enhancing the prediction. Our proposed method is intrusive, i.e. it requires Gammatone spectrograms of unencoded reference signals. Besides a comparable performance to ViV3, our approach provides a more robust prediction towards higher bitrates.

avatar for Guanxin Jiang

Guanxin Jiang

Dolby Germany GmbH
avatar for Arijit Biswas

Arijit Biswas

Dolby Germany GmbH

Christian Bergler

Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nuremberg

Andreas Maier

Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nuremberg

Saturday October 23, 2021 1:00pm - 1:30pm EDT
Stream A