首页|End-to-End Paired Ambisonic-Binaural Audio Rendering

End-to-End Paired Ambisonic-Binaural Audio Rendering

扫码查看
Binaural rendering is of great interest to virtual reality and immersive media.Although humans can naturally use their two ears to perceive the spatial information contained in sounds,it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources.In addition,the perceived sound varies from person to person even in the same sound field.Previous methods generally rely on indi-vidual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs.In prac-tical applications,there are two major drawbacks to existing methods.The first is a high personalization cost,as traditional methods achieve personalized needs by measuring HRTFs.The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information.Therefore,it is desirable to develop novel tech-niques to achieve personalization and accuracy at a low cost.To this end,we focus on the binaural rendering of ambisonic and propose 1)channel-shared encoder and channel-compared atten-tion integrated into neural networks and 2)a loss function quan-tifying interaural level differences to deal with spatial informa-tion.To verify the proposed method,we collect and release the first paired ambisonic-binaural dataset and introduce three met-rics to evaluate the content information and spatial information accuracy of the end-to-end methods.Extensive experimental results on the collected dataset demonstrate the superior perfor-mance of the proposed method and the shortcomings of previous methods.

Ambisonicattentionbinaural renderingneural network

Yin Zhu、Qiuqiang Kong、Junjie Shi、Shilei Liu、Xuzhou Ye、Ju-Chiang Wang、Hongming Shan、Junping Zhang

展开 >

Shanghai Key Laboratory of Intelligent Information Processing,School of Computer Science,Fudan University,Shanghai 200433,China

Beijing ByteDance Technology Co.Ltd.,Shanghai 201102,China,Chinese University of Hong Kong,Hong Kong,China

Beijing ByteDance Technology Co.Ltd.,Shanghai 201102,China

Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science,Fudan University,Shanghai 200433,Shanghai Center for Brain Science and Brain-Inspired Technology,Shanghai 200031,China

展开 >

National Natural Science Foundation of ChinaNational Natural Science Foundation of China

6217605962101136

2024

自动化学报(英文版)
中国自动化学会,中国科学院自动化研究所,中国科技出版传媒股份有限公司

自动化学报(英文版)

CSTPCDEI
ISSN:2329-9266
年,卷(期):2024.11(2)
  • 38