OOIR: Observatory of International Research

Papers

(The H4-Index of IEEE-ACM Transactions on Audio Speech and Language Processing is 36. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)

Article	Citations
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications	332
Decorrelation in Feedback Delay Networks	249
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations	219
WDEA: The Structure and Semantic Fusion With Wasserstein Distance for Low-Resource Language Entity Alignment	136
Review of Methods for Automatic Speaker Verification	129
Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications	109
$\mathcal {P}$owMix: A Versatile Regularizer for Multimodal Sentiment Analysis	106
Towards Generating Diverse Audio Captions via Adversarial Training	100
Multi-Channel to Multi-Channel Noise Reduction and Reverberant Speech Preservation in Time-Varying Acoustic Scenes for Binaural Reproduction	83
The Harmonic Shift Algorithm for Efficient Multi-Pitch Detection	73
Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data	73
MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation	62
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features	60
DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding	55
Improvement of Accent Classification Models Through Grad-Transfer From Spectrograms and Gradient-Weighted Class Activation Mapping	54
Attention-Based Speech Enhancement Using Human Quality Perception Modeling	54
Interpretable Multimodal Capsule Fusion	54
A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk	54
Multi-Level Time-Frequency Bins Selection for Direction of Arrival Estimation Using a Single Acoustic Vector Sensor	53
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space	53
Inference Skipping for More Efficient Real-Time Speech Enhancement With Parallel RNNs	50
SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems	48
Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition	46
Learning Discriminative Representations and Decision Boundaries for Open Intent Detection	46
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization	45

Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors	45
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network	44
AudioLM: A Language Modeling Approach to Audio Generation	43
Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps	43
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity	42
The VoxCeleb Speaker Recognition Challenge: A Retrospective	41
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning	40
Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations	39
COVID-19 Detection via Fusion of Modulation Spectrum and Linear Prediction Speech Features	39
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing	39
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization	38
Hate Speech Detection via Dual Contrastive Learning	36