Speech Communication

Papers
(The median citation count of Speech Communication is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Spatio-temporal masked autoencoder-based phonetic segments classification from ultrasound80
AMGCN: An adaptive multi-graph convolutional network for speech emotion recognition68
Effects of harmonicity on Mandarin speech perception in cochlear implant users59
GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition45
A multimodal model for predicting feedback position and type during conversation44
On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement40
Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition39
Use of affect context in dyadic interactions for continuous emotion recognition39
Editorial Board32
Editorial Board27
Editorial Board27
Recognition of vocoded speech in English by Mandarin-speaking English-learners26
Editorial Board23
Sonorant spectra and coarticulation distinguish speakers with different dialects20
Effect of prior exposure on the perception of Japanese vowel length contrast in reverberation for nonnative listeners19
Progress of machine learning based automatic phoneme recognition and its prospect19
Editorial Board19
Editorial Board19
Subband fusion of complex spectrogram for fake speech detection17
Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean17
Facemask occlusion's impact on L2 listening comprehension17
Editorial Board17
Uneven success: automatic speech recognition and ethnicity-related dialects16
Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners16
Phase unwrapping based packet loss concealment using deep neural networks14
Coarse-to-fine speech separation method in the time-frequency domain14
Deep temporal clustering features for speech emotion recognition14
On the deficiency of intelligibility metrics as proxies for subjective intelligibility13
Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners13
Who converges? Variation reveals individual speaker adaptability13
Dialect contact in real interactions and in an agent-based model13
Review of analysis methods for speech applications12
Measuring the intelligibility of dysarthric speech through automatic speech recognition in a pluricentric language12
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations12
Zero-shot voice conversion based on feature disentanglement12
Improved AED with multi-stage feature extraction and fusion based on RFAConv and PSA11
Oral configurations during vowel nasalization in English11
Disordered speech recognition considering low resources and abnormal articulation11
Progressive channel fusion for more efficient TDNN on speaker verification11
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions10
A comparative study of fundamental frequency stability between speech and singing10
A comprehensive study on supervised single-channel noisy speech separation with multi-task learning10
Computer-assisted pronunciation training—Speech synthesis is almost all you need9
Exploiting Locality Sensitive Hashing - Clustering and gloss feature for sign language production9
Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations9
Advancing speaker embedding learning: Wespeaker toolkit for research and production9
Acoustic properties of non-native clear speech: Korean speakers of English9
Development of a hybrid word recognition system and dataset for the Azerbaijani Sign Language dactyl alphabet9
Text-conditioned Transformer for automatic pronunciation error detection9
Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion8
Editorial Board8
Editorial Board8
Spoken language identification: An overview of past and present research trends8
A comparison of discrete and continuous prominence perception methods in German8
NHSS: A speech and singing parallel database8
Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition8
Exploring the effects of restraining the use of gestures on narrative speech8
Acoustic characterization and machine prediction of perceived masculinity and femininity in adults7
Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting7
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis7
Prosodic development from 4 to 10 years: Data from the Italian adaptation of the PEPS-C7
Perceptual effects of interpolated Austrian and German standard varieties7
Editorial Board7
Intonational alignment in second language acquisition7
Editorial Board7
Editorial Board6
Speech pause distribution as an early marker for Alzheimer’s disease6
Some properties of mental speech preparation as revealed by self-monitoring6
The dependence of accommodation processes on conversational experience6
A novel distortion-tolerant speech encryption scheme for secure voice communication6
Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions6
Adapted Weighted Linear Prediction with Attenuated Main Excitation for formant frequency estimation in high-pitched singing6
Self-supervised speech denoising using only noisy audio signals6
End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations6
PACDNN: A phase-aware composite deep neural network for speech enhancement6
Data augmentation for speech separation6
Exploiting the directional coherence function for multichannel source extraction6
Shared and task-specific phase coding characteristics of gamma- and theta-bands in speech perception and covert speech6
Prosody and fluency of Finland Swedish as a second language: Investigating global parameters for automated speaking assessment6
Speech/music classification using phase-based and magnitude-based features6
Factorized and progressive knowledge distillation for CTC-based ASR models6
Pronunciation error detection model based on feature fusion5
Forms, factors and functions of phonetic convergence: Editorial5
Bangladeshi Bangla speech corpus for automatic speech recognition research5
The role of probability and duration in perception of speech sounds5
Consonant gemination in Italian: The nasal and liquid case5
Computational modelling of segmental and prosodic levels of analysis for capturing variation across Arabic dialects5
Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition5
Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation5
Curriculum Learning based approaches for robust end-to-end far-field speech recognition5
Perceptual asymmetry between pitch peaks and valleys5
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions5
An introduction to pluricentric languages in speech science and technology5
Articulation rates’ inter-correlations and discriminating powers in an English speech corpus5
Perceptual clustering of high-pitched vowels in Chinese Yue Opera5
Factorized WaveNet for voice conversion with limited data5
An analysis of prosodic boundaries across speaking styles in two varieties of German5
The effect of sampling variability on systems and individual speakers in likelihood ratio-based forensic voice comparison4
RETRACTED: Multi-channel adaptive loudness compensation algorithm based on noise tracking in digital hearing aids4
Monophthong vocal tract shapes are sufficient for articulatory synthesis of German primary diphthongs4
Editorial Board4
Modulation spectral features for speech emotion recognition using deep neural networks4
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces4
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt4
A corpus of audio-visual recordings of linguistically balanced, Danish sentences for speech-in-noise experiments4
A robust temporal map of speech monitoring from planning to articulation4
Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion4
Automatic speaker and age identification of children from raw speech using sincNet over ERB scale4
APIN: Amplitude- and phase-aware interaction network for speech emotion recognition4
Tone-syllable synchrony in Mandarin: New evidence and implications4
One-class network leveraging spectro-temporal features for generalized synthetic speech detection4
The Hearing-Aid Speech Perception Index (HASPI) Version 24
Emotional voice conversion: Theory, databases and ESD4
Cross-modal information fusion for voice spoofing detection4
Nasal coarticulation in Lombard speech3
The impact of first and second formant variations on vowel identification among elderly Japanese listeners3
Identity Retention and Emotion Converted StarGAN for low-resource emotional speaker recognition3
Differential constant-beamwidth beamforming with cube arrays3
Phonetic imitation of multidimensional acoustic variation of the nasal split short-a system3
Editorial Board3
Investigating a neural all pass warp in modern TTS applications3
Comparing Levenshtein distance and dynamic time warping in predicting listeners’ judgments of accent distance3
Speech rhythm convergence in a dyadic reading task3
Single-channel speech enhancement using improved progressive deep neural network and masking-based harmonic regeneration3
Editorial Board3
Editorial Board3
An ensemble technique to predict Parkinson's disease using machine learning algorithms3
A new time–frequency representation based on the tight framelet packet for telephone-band speech coding3
Editorial Board3
One-shot emotional voice conversion based on feature separation3
Editorial Board3
Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphone3
Editorial Board3
The Second-Language Productivity of Two Mandarin Tone Sandhi Patterns3
Editorial Board3
Editorial Board3
Vocal emotion perception in Mandarin-speaking older adults with hearing loss3
Combined approach to dysarthric speaker verification using data augmentation and feature fusion3
Spectral sparsification of speech signals and its interaction with top-down mechanisms in adult cochlear implant users3
Editorial Board3
Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment3
Transfer knowledge for punctuation prediction via adversarial training3
CAST: Context-association architecture with simulated long-utterance training for mandarin speech recognition3
Strengthening speech content authentication against tampering3
Editorial Board3
The amalgamation of wavelet packet information gain entropy tuned source and system parameters for improved speech emotion recognition3
Discriminative speaker embedding with serialized multi-layer multi-head attention3
Modeling trajectories of human speech articulators using general Tau theory2
The prosody of theme, rheme and focus in Egyptian Arabic: A quantitative investigation of tunes, configurations and speaker variability2
Comparative analysis of various feature extraction techniques for classification of speech disfluencies2
SDTF-Net: Static and dynamic time–frequency network for Speech Emotion Recognition2
Introducing ISAP and MATSS: Mental stress induced speech utterance procedure and obtained dataset2
The N400 reveals implicit accent-induced prejudice2
The Relationship Between Turn-taking, Vocal Pitch Synchrony, and Rapport in Creative Problem-Solving Communication2
Fractional feature-based speech enhancement with deep neural network2
Arabic Automatic Speech Recognition: Challenges and Progress2
Efficient time-domain speech separation using short encoded sequence network2
Foreign accent strength and intelligibility at the segmental level2
Pathological voice classification using MEEL features and SVM-TabNet model2
On intrusive speech quality measures and a global SNR based metric2
Speakers’ vocal expression of sexual orientation depends on experimenter gender2
Multimodal attention for lip synthesis using conditional generative adversarial networks2
Acoustic features correlated to perceived urgency in evacuation announcements2
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement2
Speech emotion recognition approaches: A systematic review2
Compact deep neural networks for real-time speech enhancement on resource-limited devices2
Incorporating group update for speech enhancement based on convolutional gated recurrent network2
The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy2
Speech intelligibility deterioration for normal hearing and hearing impaired patients with different types of tinnitus2
Comparison and analysis of new curriculum criteria for end-to-end ASR2
Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation2
An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer2
Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift2
The effect of musical expertise on whistled vowel identification2
A two-stage complex network using cycle-consistent generative adversarial networks for speech enhancement2
Automatic classification of neurological voice disorders using wavelet scattering features2
Analysis of forced aligner performance on L2 English speech2
Role of language familiarity in understanding speech in noise under various acoustic environments2
An investigation of domain adaptation in speaker embedding space for speaker recognition2
Uncertainty assessment for detection of spoofing attacks to speaker verification systems using a Bayesian approach2
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation2
Accurate synthesis of dysarthric Speech for ASR data augmentation1
Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language1
Frequent-words analysis for forensic speaker comparison1
Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition1
CASE-Net: Integrating local and non-local attention operations for speech enhancement1
Robust Speaker Identification Based on Binaural Masks1
Decoupled structure for improved adaptability of end-to-end models1
Toward enriched decoding of mandarin spontaneous speech1
Visual-articulatory cues facilitate children with CIs to better perceive Mandarin tones in sentences1
Editorial Board1
Choosing only the best voice imitators: Top-K many-to-many voice conversion with StarGAN1
Investigating voice onset time in Pakistani English speech1
Performance of single-channel speech enhancement algorithms on Mandarin listeners with different immersion conditions in New Zealand English1
Editorial Board1
Analysis-by-synthesis based training target extraction of the DNN for noise masking1
Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics1
A study on the perception of prosodic cues to focus by Egyptian listeners: Some make use of them, but most of them don't1
Comparing the nativeness vs. intelligibility approach in prosody instruction for developing speaking skills by interpreter trainees: An experimental study1
Efficient acoustic feature transformation in mismatched environments using a Guided-GAN1
Neural speech-rate conversion with multispeaker WaveNet vocoder1
Spontaneous postural adjustments reduce perceived listening effort1
An automated integrated speech and face imageanalysis system for the identification of human emotions1
CSLNSpeech: Solving the extended speech separation problem with the help of Chinese sign language1
Editorial Board1
Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency1
Editorial Board1
CLESSR-VC: Contrastive learning enhanced self-supervised representations for one-shot voice conversion1
AVID: A speech database for machine learning studies on vocal intensity1
Addressing the semi-open set dialect recognition problem under resource-efficient considerations1
Understanding acceptability of disordered speech through Audience Response Systems-based evaluation1
Perceptual learning of phonetic convergence1
Speakers of different L1 dialects with acoustically proximal vowel systems present with similar nonnative speech perception abilities: Data from Greek listeners of Dutch1
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech1
Retraction notice to “Multi-channel adaptive loudness compensation algorithm based on noise tracking in digital hearing aids” [Measurement 130C (2021) 2773]1
Laughter entrainment in dyadic interactions: Temporal distribution and form1
Seeing lexical tone: Head and face motion in production and perception of Cantonese lexical tones1
HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement1
The influence of task engagement on phonetic convergence1
Depression assessment in people with Parkinson’s disease: The combination of acoustic features and natural language processing1
Editorial Board1
An overview of high-resource automatic speech recognition methods and their empirical evaluation in low-resource environments1
DESCU: Dyadic emotional speech corpus and recognition system for Urdu language1
Unsupervised Automatic Speech Recognition: A review1
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition1
Self-distillation-based domain exploration for source speaker verification under spoofed speech from unknown voice conversion1
0.68165493011475