Speech Communication

Papers
(The TQCC of Speech Communication is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Spatio-temporal masked autoencoder-based phonetic segments classification from ultrasound80
AMGCN: An adaptive multi-graph convolutional network for speech emotion recognition68
Effects of harmonicity on Mandarin speech perception in cochlear implant users59
GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition45
A multimodal model for predicting feedback position and type during conversation44
On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement40
Use of affect context in dyadic interactions for continuous emotion recognition39
Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition39
Editorial Board32
Editorial Board27
Editorial Board27
Recognition of vocoded speech in English by Mandarin-speaking English-learners26
Editorial Board23
Sonorant spectra and coarticulation distinguish speakers with different dialects20
Progress of machine learning based automatic phoneme recognition and its prospect19
Editorial Board19
Editorial Board19
Effect of prior exposure on the perception of Japanese vowel length contrast in reverberation for nonnative listeners19
Facemask occlusion's impact on L2 listening comprehension17
Editorial Board17
Subband fusion of complex spectrogram for fake speech detection17
Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean17
Uneven success: automatic speech recognition and ethnicity-related dialects16
Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners16
Coarse-to-fine speech separation method in the time-frequency domain14
Deep temporal clustering features for speech emotion recognition14
Phase unwrapping based packet loss concealment using deep neural networks14
Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners13
Who converges? Variation reveals individual speaker adaptability13
Dialect contact in real interactions and in an agent-based model13
On the deficiency of intelligibility metrics as proxies for subjective intelligibility13
Measuring the intelligibility of dysarthric speech through automatic speech recognition in a pluricentric language12
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations12
Zero-shot voice conversion based on feature disentanglement12
Review of analysis methods for speech applications12
Disordered speech recognition considering low resources and abnormal articulation11
Progressive channel fusion for more efficient TDNN on speaker verification11
Improved AED with multi-stage feature extraction and fusion based on RFAConv and PSA11
Oral configurations during vowel nasalization in English11
A comparative study of fundamental frequency stability between speech and singing10
A comprehensive study on supervised single-channel noisy speech separation with multi-task learning10
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions10
Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations9
Advancing speaker embedding learning: Wespeaker toolkit for research and production9
Acoustic properties of non-native clear speech: Korean speakers of English9
Development of a hybrid word recognition system and dataset for the Azerbaijani Sign Language dactyl alphabet9
Text-conditioned Transformer for automatic pronunciation error detection9
Computer-assisted pronunciation training—Speech synthesis is almost all you need9
Exploiting Locality Sensitive Hashing - Clustering and gloss feature for sign language production9
Spoken language identification: An overview of past and present research trends8
A comparison of discrete and continuous prominence perception methods in German8
NHSS: A speech and singing parallel database8
Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition8
Exploring the effects of restraining the use of gestures on narrative speech8
Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion8
Editorial Board8
Editorial Board8
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis7
Prosodic development from 4 to 10 years: Data from the Italian adaptation of the PEPS-C7
Perceptual effects of interpolated Austrian and German standard varieties7
Editorial Board7
Intonational alignment in second language acquisition7
Editorial Board7
Acoustic characterization and machine prediction of perceived masculinity and femininity in adults7
Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting7
Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions6
Adapted Weighted Linear Prediction with Attenuated Main Excitation for formant frequency estimation in high-pitched singing6
Self-supervised speech denoising using only noisy audio signals6
End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations6
PACDNN: A phase-aware composite deep neural network for speech enhancement6
Data augmentation for speech separation6
Exploiting the directional coherence function for multichannel source extraction6
Shared and task-specific phase coding characteristics of gamma- and theta-bands in speech perception and covert speech6
Prosody and fluency of Finland Swedish as a second language: Investigating global parameters for automated speaking assessment6
Speech/music classification using phase-based and magnitude-based features6
Factorized and progressive knowledge distillation for CTC-based ASR models6
Editorial Board6
Speech pause distribution as an early marker for Alzheimer’s disease6
Some properties of mental speech preparation as revealed by self-monitoring6
The dependence of accommodation processes on conversational experience6
A novel distortion-tolerant speech encryption scheme for secure voice communication6
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions5
Computational modelling of segmental and prosodic levels of analysis for capturing variation across Arabic dialects5
Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition5
Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation5
Perceptual clustering of high-pitched vowels in Chinese Yue Opera5
Factorized WaveNet for voice conversion with limited data5
An analysis of prosodic boundaries across speaking styles in two varieties of German5
An introduction to pluricentric languages in speech science and technology5
Articulation rates’ inter-correlations and discriminating powers in an English speech corpus5
Bangladeshi Bangla speech corpus for automatic speech recognition research5
The role of probability and duration in perception of speech sounds5
Consonant gemination in Italian: The nasal and liquid case5
Pronunciation error detection model based on feature fusion5
Forms, factors and functions of phonetic convergence: Editorial5
Curriculum Learning based approaches for robust end-to-end far-field speech recognition5
Perceptual asymmetry between pitch peaks and valleys5
0.061369895935059