Speech Communication

Papers
(The TQCC of Speech Communication is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-05-01 to 2024-05-01.)
ArticleCitations
Speech emotion recognition using fusion of three multi-task learning-based classifiers: HSF-DNN, MS-CNN and LLD-RNN92
Learning deep multimodal affective features for spontaneous speech emotion recognition53
Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features52
Masked multi-head self-attention for causal speech enhancement49
CN-Celeb: Multi-genre speaker recognition44
Emotional voice conversion: Theory, databases and ESD37
Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion34
The Hearing-Aid Speech Perception Index (HASPI) Version 231
Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition24
Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM24
A review of multi-objective deep learning speech denoising methods22
Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate21
Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework20
Unsupervised Automatic Speech Recognition: A review20
CyTex: Transforming speech to textured images for speech emotion recognition19
Automatic accent identification as an analytical tool for accent robust automatic speech recognition19
An Iterative Graph Spectral Subtraction Method for Speech Enhancement18
Speech enhancement using a DNN-augmented colored-noise Kalman filter18
A time–frequency smoothing neural network for speech enhancement18
Computer-assisted pronunciation training—Speech synthesis is almost all you need17
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition17
Automatic speaker profiling from short duration speech data16
Speech pause distribution as an early marker for Alzheimer’s disease16
B&Anet: Combining bidirectional LSTM and self-attention for end-to-end learning of task-oriented dialogue system16
A formant modification method for improved ASR of children’s speech15
GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition15
Amplitude and Frequency Modulation-based features for detection of replay Spoof Speech15
Text-conditioned Transformer for automatic pronunciation error detection15
A supervised non-negative matrix factorization model for speech emotion recognition15
Modulation spectral features for speech emotion recognition using deep neural networks15
DeepConversion: Voice conversion with limited parallel training data14
PACDNN: A phase-aware composite deep neural network for speech enhancement13
Uneven success: automatic speech recognition and ethnicity-related dialects13
Phonetic accommodation to natural and synthetic voices: Behavior of groups and individuals in speech shadowing13
Perceptual realization of Greek consonants by Russian monolingual speakers13
Automatic quality control and enhancement for voice-based remote Parkinson’s disease detection13
Analytic phase features for dysarthric speech detection and intelligibility assessment13
A study on data augmentation in voice anti-spoofing13
Improving phoneme recognition of throat microphone speech recordings using transfer learning12
Speech signal processing on graphs: The graph frequency analysis and an improved graph Wiener filtering method12
Automatic classification of infant vocalization sequences with convolutional neural networks12
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt12
Analysis of glottal inverse filtering in the presence of source-filter interaction12
Model architectures to extrapolate emotional expressions in DNN-based text-to-speech12
A method for improving bot effectiveness by recognising implicit customer intent in contact centre conversations11
The interplay of prosodic cues in the L2: How intonation, rhythm, and speech rate in speech by Spanish learners of Dutch contribute to L1 Dutch perceptions of accentedness and comprehensibility11
Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems9
Seeing lexical tone: Head and face motion in production and perception of Cantonese lexical tones9
Discriminative neural network pruning in a multiclass environment: A case study in spoken emotion recognition9
A two-stage complex network using cycle-consistent generative adversarial networks for speech enhancement9
An investigation of domain adaptation in speaker embedding space for speaker recognition9
Non-intrusive quality assessment of noise-suppressed speech using unsupervised deep features9
Accuracy, recording interference, and articulatory quality of headsets for ultrasound recordings9
A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool9
Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers8
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech8
NHSS: A speech and singing parallel database8
Sinusoidal model-based hypernasality detection in cleft palate speech using CVCV sequence8
Improving speaker de-identification with functional data analysis of f0 trajectories8
Learning transfer from singing to speech: Insights from vowel analyses in aging amateur singers and non-singers8
Bangladeshi Bangla speech corpus for automatic speech recognition research8
Multistage approach for steerable differential beamforming with rectangular arrays8
Cross-modal information fusion for voice spoofing detection8
Affective synthesis and animation of arm gestures from speech prosody8
Automatic intelligibility assessment of dysarthric speech using glottal parameters7
RPCA-based real-time speech and music separation method7
Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language7
Acoustic differences in emotional speech of people with dysarthria7
Computer-assisted assessment of phonetic fluency in a second language: a longitudinal study of Japanese learners of French7
Discriminative speaker embedding with serialized multi-layer multi-head attention7
Acoustic and temporal representations in convolutional neural network models of prosodic events7
Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers7
Analysis of trade-offs between magnitude and phase estimation in loss functions for speech denoising and dereverberation7
Glottal flow characteristics in vowels produced by speakers with heart failure6
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors6
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition6
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation6
Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations6
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations6
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler6
An automated integrated speech and face imageanalysis system for the identification of human emotions6
Comparing the nativeness vs. intelligibility approach in prosody instruction for developing speaking skills by interpreter trainees: An experimental study6
A unified system for multilingual speech recognition and language identification6
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces6
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis6
A bimodal network based on Audio–Text-Interactional-Attention with ArcFace loss for speech emotion recognition6
Multilingual speech recognition for GlobalPhone languages6
Dysarthria severity classification using multi-head attention and multi-task learning6
Phonetic imitation of multidimensional acoustic variation of the nasal split short-a system6
Foreign accent strength and intelligibility at the segmental level6
The effect of sampling variability on systems and individual speakers in likelihood ratio-based forensic voice comparison6
Modeling concurrent vowel identification for shorter durations5
End-to-end acoustic modelling for phone recognition of young readers5
CASE-Net: Integrating local and non-local attention operations for speech enhancement5
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis5
The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners5
Adaptive and hybrid Kronecker product beamforming for far-field speech signals5
Fusing features of speech for depression classification based on higher-order spectral analysis5
Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift5
Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition5
Differences between listeners with early and late immersion age in spatial release from masking in various acoustic environments5
Learning and controlling the source-filter representation of speech with a variational autoencoder5
Native language identification for Indian-speakers by an ensemble of phoneme-specific, and text-independent convolutions5
An empirical study of the effect of acoustic-prosodic entrainment on the perceived trustworthiness of conversational avatars5
0.028337001800537