An Assistant System for Speaker and Sentiment Recognition Using RAM and a Hybrid AI Model

Bozyiğit, Fatma; AYGÜN, İRFAN; Sağlam, Oğuzhan; Özcan, Eren; BORANDAĞ, EMİN; KARASULU, BAHADIR

doi:10.3390/electronics15081731

An Assistant System for Speaker and Sentiment Recognition Using RAM and a Hybrid AI Model

Bozyiğit F., AYGÜN İ., Sağlam O., Özcan E., BORANDAĞ E., KARASULU B.

Electronics (Switzerland), cilt.15, sa.8, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15 Sayı: 8
Basım Tarihi: 2026
Doi Numarası: 10.3390/electronics15081731
Dergi Adı: Electronics (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
Anahtar Kelimeler: ASR, assistant systems, AutoML, DL, emotion recognition, feature selection, microservice, ML, RAM, speech recognition CNN
Çanakkale Onsekiz Mart Üniversitesi Adresli: Evet

Özet

In the age of remote communication and digital archiving, automated analysis of voice data has become increasingly important in various application areas. Despite significant advances in the field of Automatic Speech Recognition, integrating speaker recognition, textual sentiment analysis, and acoustic sentiment detection within a unified real-time processing pipeline remains a challenging task. Current approaches are often limited to monolithic designs or operate in batch processing modes, which restricts their scalability and real-time applicability. To address this gap, this work proposes a novel feature selection method called RAM, along with a hybrid decision-level merging approach combining Conv1D CNN and AutoML-based models. The proposed hybrid framework enables independent model training and integrates its probabilistic outputs through a weighted merging strategy for performance improvement. Furthermore, a scalable microservice-based software architecture has been developed to support real-time processing, feature selection, and model deployment. This design enhances system modularity, flexibility, and integration capability in practical applications. Experimental results show that when the proposed RAM method is used in conjunction with a hybrid AI model, it achieves over 97% accuracy in speaker recognition and over 82% accuracy in emotion classification, even with short audio samples. These findings demonstrate that the proposed approach provides a robust and efficient solution for real-time speech analysis tasks.