TY - GEN
T1 - Replicating Human Sound Localization with a Multi-Layer Perceptron
AU - Sumner, Eric Michael
AU - Unnthorsson, Runar
AU - Riedel, Morris
N1 - Funding Information: This work was performed in the Center of Excellence (CoE) Research on AI-and Simulation-Based Engineering at Ex-ascale (RAISE) receiving funding from EU’s Horizon 2020 Research and Innovation Framework Programme H2020-INFRAEDI-2019-1 under grant agreement no. 951733. Funding Information: Icelandic HPC Competence Center is funded by the Eu-roCC project that has received funding from the European HPC Joint Undertaking (JU) under grant agreement No 951732. The JU receives support from the EU’s Horizon 2020 research and innovation programme. Funding Information: The work received support from the NordForsk’s Nordic Sound and Music Computing Network (NordicSMC), project number 86892. Publisher Copyright: Copyright: © 2022 Eric Michael Sumner et al.
PY - 2022
Y1 - 2022
N2 - One of the key capabilities of the human sense of hearing is to determine the direction from which a sound is emanating, a task known as localization. This paper describes the derivation of a machine learning model which performs the same localization task: Given an audio waveform which arrives at the listener's eardrum, determine the direction of the audio source. Head-related transfer functions (HRTFs) from the ITA-HRTF database of 48 individuals are used to train and validate this model. A series of waveforms is generated from each HRTF, representing the sound pressure level at the listener's eardrums for various source directions. A feature vector is calculated for each waveform from acoustical properties motivated by prior literature on sound localization; these feature vectors are used to train multi-layer perceptrons (MLPs), a form of artificial neural network, to replicate the behavior of single individuals. Data from three individuals are used to optimize hyperparameters of both the feature extraction and MLP stages for model accuracy. These hyperparameters are then validated by training and analyzing models for all 48 individuals in the database. The errors produced by each model fall in a log-normal distribution. The median model is capable of identifying, with 95% confidence, the sound source direction to within 20 degrees. This result is comparable to previously-reported human capabilities and thus shows that an MLP can successfully replicate the human sense of sound localization.
AB - One of the key capabilities of the human sense of hearing is to determine the direction from which a sound is emanating, a task known as localization. This paper describes the derivation of a machine learning model which performs the same localization task: Given an audio waveform which arrives at the listener's eardrum, determine the direction of the audio source. Head-related transfer functions (HRTFs) from the ITA-HRTF database of 48 individuals are used to train and validate this model. A series of waveforms is generated from each HRTF, representing the sound pressure level at the listener's eardrums for various source directions. A feature vector is calculated for each waveform from acoustical properties motivated by prior literature on sound localization; these feature vectors are used to train multi-layer perceptrons (MLPs), a form of artificial neural network, to replicate the behavior of single individuals. Data from three individuals are used to optimize hyperparameters of both the feature extraction and MLP stages for model accuracy. These hyperparameters are then validated by training and analyzing models for all 48 individuals in the database. The errors produced by each model fall in a log-normal distribution. The median model is capable of identifying, with 95% confidence, the sound source direction to within 20 degrees. This result is comparable to previously-reported human capabilities and thus shows that an MLP can successfully replicate the human sense of sound localization.
UR - https://www.scopus.com/pages/publications/85128227115
M3 - Conference contribution
T3 - Proceedings of the Sound and Music Computing Conferences
SP - 271
EP - 278
BT - SMC/JIM/IFC 2022 - Proceedings of the 19th Sound and Music Computing Conference
A2 - Michon, Romain
A2 - Pottier, Laurent
A2 - Orlarey, Yann
PB - Sound and Music Computing Network
T2 - 19th Sound and Music Computing Conference, SMC 2022
Y2 - 5 June 2022 through 12 June 2022
ER -