期刊,Annals of telecommunications 2025年80卷5/6期_国家学术搜索

ISIVC 2024 special issue-signal and audio processing, digital communications, and networking

Abdellah AdibSofia Ben JebaraRaja ElassaliKhalid Minaoui...

375-377页

查看更多>>摘要：This special issue features extended versions of selected papers from the 12th edition of ISIVC (International Conference on Signal, Image, Video, and Communications), held in Marrakech, Morocco, from May 21 to 23, 2024. As with previous editions, ISIVC 2024 brought together leading researchers, academics, and practitioners from around the world to discuss recent advances in signal and image processing, computer vision, communication systems, and artificial intelligence. This special issue showcases the high-quality research presented at ISIVC 2024. Out of 108 submissions to the conference, 66 were accepted as full papers with oral presentations. As guest editors, we invited authors of the top-rated papers to extend their work and submit it for consideration. Following a rigorous peer-review and revision process, 12 papers were ultimately selected to be extended as a Journal publication in Annals of Telecommunications. These selected works represent a cross-section of innovative approaches and emerging trends across a broad range of topics central to ISIVC. They place particular emphasis on interdisciplinary applications.

原文链接:

NETL
NSTL

ISIVC 2024 special issue-signal and audio processing, digital communications, and networking

Abdellah AdibSofia Ben JebaraRaja ElassaliKhalid Minaoui...

375-377页

查看更多>>摘要：This special issue features extended versions of selected papers from the 12th edition of ISIVC (International Conference on Signal, Image, Video, and Communications), held in Marrakech, Morocco, from May 21 to 23, 2024. As with previous editions, ISIVC 2024 brought together leading researchers, academics, and practitioners from around the world to discuss recent advances in signal and image processing, computer vision, communication systems, and artificial intelligence. This special issue showcases the high-quality research presented at ISIVC 2024. Out of 108 submissions to the conference, 66 were accepted as full papers with oral presentations. As guest editors, we invited authors of the top-rated papers to extend their work and submit it for consideration. Following a rigorous peer-review and revision process, 12 papers were ultimately selected to be extended as a Journal publication in Annals of Telecommunications. These selected works represent a cross-section of innovative approaches and emerging trends across a broad range of topics central to ISIVC. They place particular emphasis on interdisciplinary applications.

原文链接:

NETL
NSTL

Efficient bimodal emotion recognition system based on speech/text embeddings and ensemble learning fusion

Adil ChakhtounaSara SekkateAbdellah Adib

379-399页

查看更多>>摘要：Emotion recognition (ER) is a pivotal discipline in the field of contemporary human-machine interaction. Its primary objective is to explore and advance theories, systems, and methodologies that can effectively recognize, comprehend, and interpret human emotions. This research investigates both unimodal and bimodal strategies for ER using advanced feature embeddings for audio and text data. We leverage pretrained models such as ImageBind for speech and RoBERTa, alongside traditional TF-IDF embeddings for text, to achieve accurate recognition of emotional states. A variety of machine learning (ML) and deep learning (DL) algorithms were implemented to evaluate their performance in speaker dependent (SD) and speaker independent (SI) scenarios. Additionally, three feature fusion methods, early fusion, majority voting fusion, and stacking ensemble fusion, were employed for the bimodal emotion recognition (BER) task. Extensive numerical simulations were conducted to systematically address the complexities and challenges associated with both unimodal and bimodal ER. Our most remarkable findings demonstrate an accuracy of 86.75% in the SD scenario and 64.04% in the SI scenario on the IEMOCAP database for the proposed BER system.

原文链接:

NETL
NSTL

Efficient bimodal emotion recognition system based on speech/text embeddings and ensemble learning fusion

Adil ChakhtounaSara SekkateAbdellah Adib

379-399页

查看更多>>摘要：Emotion recognition (ER) is a pivotal discipline in the field of contemporary human-machine interaction. Its primary objective is to explore and advance theories, systems, and methodologies that can effectively recognize, comprehend, and interpret human emotions. This research investigates both unimodal and bimodal strategies for ER using advanced feature embeddings for audio and text data. We leverage pretrained models such as ImageBind for speech and RoBERTa, alongside traditional TF-IDF embeddings for text, to achieve accurate recognition of emotional states. A variety of machine learning (ML) and deep learning (DL) algorithms were implemented to evaluate their performance in speaker dependent (SD) and speaker independent (SI) scenarios. Additionally, three feature fusion methods, early fusion, majority voting fusion, and stacking ensemble fusion, were employed for the bimodal emotion recognition (BER) task. Extensive numerical simulations were conducted to systematically address the complexities and challenges associated with both unimodal and bimodal ER. Our most remarkable findings demonstrate an accuracy of 86.75% in the SD scenario and 64.04% in the SI scenario on the IEMOCAP database for the proposed BER system.

原文链接:

NETL
NSTL

Multimodal emotion recognition: integrating speech and text for improved valence, arousal, and dominance prediction

Messaoudi AwatefBoughrara HayetLachiri Zied

401-415页

查看更多>>摘要：While speech emotion recognition has traditionally focused on classifying emotions into discrete categories like happy or angry, recent research has shifted towards a dimensional approach using the Valence-Arousal-Dominance model. This model captures the continuous emotional state. However, research in speech emotion recognition (SER) consistently shows lower performance in predicting valence compared to arousal and dominance. To improve performance, we propose a system that combines acoustic and linguistic information. This work explores a novel multimodal approach for emotion recognition that combines speech and text data. This fusion strategy aims to outperform the traditional single-modality systems. Both early and late fusion techniques are investigated in this paper. Our findings show that combining modalities in a late fusion approach enhances system performance. In this late fusion architecture, the outputs from the acoustic deep learning network and the linguistic network are fed into two stacked dense neural network (NN) layers to predict valence, arousal, and dominance as continuous values. This approach leads to a significant improvement in overall emotion recognition performance compared to prior methods.

原文链接:

NETL
NSTL

Multimodal emotion recognition: integrating speech and text for improved valence, arousal, and dominance prediction

Messaoudi AwatefBoughrara HayetLachiri Zied

401-415页

查看更多>>摘要：While speech emotion recognition has traditionally focused on classifying emotions into discrete categories like happy or angry, recent research has shifted towards a dimensional approach using the Valence-Arousal-Dominance model. This model captures the continuous emotional state. However, research in speech emotion recognition (SER) consistently shows lower performance in predicting valence compared to arousal and dominance. To improve performance, we propose a system that combines acoustic and linguistic information. This work explores a novel multimodal approach for emotion recognition that combines speech and text data. This fusion strategy aims to outperform the traditional single-modality systems. Both early and late fusion techniques are investigated in this paper. Our findings show that combining modalities in a late fusion approach enhances system performance. In this late fusion architecture, the outputs from the acoustic deep learning network and the linguistic network are fed into two stacked dense neural network (NN) layers to predict valence, arousal, and dominance as continuous values. This approach leads to a significant improvement in overall emotion recognition performance compared to prior methods.

原文链接:

NETL
NSTL

Handling data scarcity through data augmentation for detecting offensive speech

Sara SekkateSafa ChebbiAbdellah AdibSofia Ben Jebara...

417-426页

查看更多>>摘要：Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.

原文链接:

NETL
NSTL

Handling data scarcity through data augmentation for detecting offensive speech

Sara SekkateSafa ChebbiAbdellah AdibSofia Ben Jebara...

417-426页

查看更多>>摘要：Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.

原文链接:

NETL
NSTL

Advanced speech biomarker integration for robust Alzheimer's disease diagnosis

Anass El HallaniAdil ChakhtounaAbdellah Adib

427-444页

查看更多>>摘要：The healthcare sector has witnessed a transformative shift in recent years, driven by rapid advancements in digital technologies. Among the myriad of applications, the management of Alzheimer's disease (AD) has garnered significant attention. AD, the most common form of dementia, affects millions globally and presents a significant challenge due to its progressive and currently incurable nature. Early detection is crucial, yet existing diagnostic methods are invasive, expensive, and not readily accessible. This study proposes a hybrid approach combining traditional acoustic features (e.g., MFCC, pitch, jitter, shimmer) with deep learning-based embeddings (YAMNet, VGGish) to enhance the robustness and accuracy of AD detection through speech analysis. The methodology involves comprehensive feature extraction, dimensionality reduction via autoencoders, and classification using advanced machine learning (ML) and deep learning (DL) models. Evaluation on the ADReSS dataset demonstrates the proposed method's superior performance, achieving an accuracy of 89.9% with a deep neural network classifier. The results highlight the potential of integrating traditional and modern techniques to develop non-invasive, cost-effective, and accessible tools for early AD detection, paving the way for timely intervention and improved patient outcomes. Future work will focus on expanding datasets, incorporating diverse demographics, and refining models for better sensitivity and specificity in clinical applications.

原文链接:

NETL
NSTL

Advanced speech biomarker integration for robust Alzheimer's disease diagnosis

Anass El HallaniAdil ChakhtounaAbdellah Adib

427-444页

查看更多>>摘要：The healthcare sector has witnessed a transformative shift in recent years, driven by rapid advancements in digital technologies. Among the myriad of applications, the management of Alzheimer's disease (AD) has garnered significant attention. AD, the most common form of dementia, affects millions globally and presents a significant challenge due to its progressive and currently incurable nature. Early detection is crucial, yet existing diagnostic methods are invasive, expensive, and not readily accessible. This study proposes a hybrid approach combining traditional acoustic features (e.g., MFCC, pitch, jitter, shimmer) with deep learning-based embeddings (YAMNet, VGGish) to enhance the robustness and accuracy of AD detection through speech analysis. The methodology involves comprehensive feature extraction, dimensionality reduction via autoencoders, and classification using advanced machine learning (ML) and deep learning (DL) models. Evaluation on the ADReSS dataset demonstrates the proposed method's superior performance, achieving an accuracy of 89.9% with a deep neural network classifier. The results highlight the potential of integrating traditional and modern techniques to develop non-invasive, cost-effective, and accessible tools for early AD detection, paving the way for timely intervention and improved patient outcomes. Future work will focus on expanding datasets, incorporating diverse demographics, and refining models for better sensitivity and specificity in clinical applications.

原文链接:

NETL
NSTL