Comparison of Artificial Neural Network Types for Infant Vocalization Classification

扫码查看

原文链接

NETL
NSTL
IEEE

外文摘要：In this study we compared various neural network types for the task of automatic infant vocalization classification, i.e convolutional, recurrent and fully-connected networks as well as combinations of thereof. The goal was to first determine the optimal configuration for each network type to then identify the type with the highest overall performance. This investigation helps to employ neural networks more effectively to infant vocalization classification tasks, which typically offer low amounts of training data. To this end, we defined a unified neural network architecture scheme for audio classification from which we derived various network types. For each type we performed a semi-random hyperparameter search which employed regression trees to both focus the search space as well as derive insights on the most influential parameters. We finally compared the test performances of the best performing configurations in an contest-like setup. Our key findings are: (1) Networks with convolutional stages reached the highest performance, regardless of being combined with fully-connected or recurrent layers. (2) The most influential architectural hyperparameter for all types were the integration operations for reducing tensor dimensionality between network stages. The best performing configurations reached test performances of 75% unweighted average recall, surpassing previously published benchmarks.

外文关键词：

Infantvocalizationneural networkclassification

作者：

Franz Anders、Mario Hlawitschka、Mirco Fuchs

展开 >

作者单位：

Leipzig University of Applied Sciences, Laboratory for Biosignal Processing, Leipzig, Germany

Leipzig University of Applied Sciences, Faculty of Computer Science and Media, Leipzig, Germany

出版年：

2021

DOI：

10.1109/TASLP.2020.3037414

IEEE/ACM transactions on audio, speech, and language processing

ISSN：2329-9290

年,卷(期)：2021.29(1)

被引量3
参考文献量70