IEEE-ACM Transactions on Audio Speech and Language Processing

Papers
(The H4-Index of IEEE-ACM Transactions on Audio Speech and Language Processing is 30. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-05-01 to 2024-05-01.)
ArticleCitations
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units598
Pre-Training With Whole Word Masking for Chinese BERT459
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech128
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning127
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation119
CTNet: Conversational Transformer Network for Emotion Recognition111
FSD50K: An Open Dataset of Human-Labeled Sound Events106
Dense CNN With Self-Attention for Time-Domain Speech Enhancement96
Wavesplit: End-to-End Speech Separation by Speaker Clustering90
Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement79
SoundStream: An End-to-End Neural Audio Codec58
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network56
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation56
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition52
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks50
Overview and Evaluation of Sound Event Localization and Detection in DCASE 201948
Analyzing Multimodal Sentiment Via Acoustic- and Visual-LSTM With Channel-Aware Temporal Convolution Network45
The Detection of Parkinson's Disease From Speech Using Voice Source Information45
FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning44
Towards Model Compression for Deep Learning Based Speech Enhancement42
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation41
Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks36
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement35
Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC35
Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog33
Audio-Visual Deep Neural Network for Robust Person Verification32
Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations31
Expressive TTS Training With Frame and Style Reconstruction Loss31
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis31
Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data30
0.030761003494263