首页|Researcher from Stanford University Provides Details of New Studies and Findings in the Area of Artificial Intelligence (Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space)

Researcher from Stanford University Provides Details of New Studies and Findings in the Area of Artificial Intelligence (Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space)

扫码查看
Research findings on artificial intelligence are discussed in a new report. According to news originating from Stanford, California, by NewsRx correspondents, research stated, “Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels.” Financial supporters for this research include National Institutes of Health. Our news correspondents obtained a quote from the research from Stanford University: “Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU- MOSEI)’s acoustic data. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data with 74 parameters of distinctive audio features at discrete timesteps. Our model is first pre-trained to uncover the randomly masked timestamps of the acoustic data. The pre-trained model is then finetuned using a small sample of annotated data. The performance of the final model is then evaluated via overall mean absolute error (MAE), mean absolute error (MAE) per emotion, overall four-class accuracy, and four-class accuracy per emotion. These metrics are compared against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics, especially when the number of annotated data points in the fine-tuning step is small.”

Stanford UniversityStanfordCaliforniaUnited StatesNorth and Central AmericaAffective ComputingArtificial IntelligenceEmerging TechnologiesEngineeringMachine Learning

2024

Robotics & Machine Learning Daily News

Robotics & Machine Learning Daily News

ISSN:
年,卷(期):2024.(Feb.6)
  • 25