IEEE-ACM Transactions on Audio Speech and Language Processing

Papers
(The median citation count of IEEE-ACM Transactions on Audio Speech and Language Processing is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning214
Low-Latency Active Noise Control Using Attentive Recurrent Network183
CL-XABSA: Contrastive Learning for Cross-Lingual Aspect-Based Sentiment Analysis151
Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation88
List of Reviewers81
Dual Microphone Speech Enhancement Based on Statistical Modeling of Interchannel Phase Difference73
Towards Maximizing a Perceptual Sweet Spot for Spatial Sound With Loudspeakers68
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification58
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations52
Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments51
Lightweight Speaker Verification Using Transformation Module With Feature Partition and Fusion47
One General Teacher for Multi-Data Multi-Task: A New Knowledge Distillation Framework for Discourse Relation Analysis45
SANet: A Compressed Speech Encoder and Steganography Algorithm Independent Steganalysis Deep Neural Network45
Cross-Domain Aspect-Based Sentiment Classification With Tripartite Graph Modeling44
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition41
DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding40
Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions39
On Ambisonic Source Separation With Spatially Informed Non-Negative Tensor Factorization39
Operation-Augmented Numerical Reasoning for Question Answering38
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks36
JMS-QA: A Joint Hierarchical Architecture for Mental Health Question Answering36
Handover QG: Question Generation by Decoder Fusion and Reinforcement Learning35
Review of Methods for Automatic Speaker Verification34
Spatial Analysis and Synthesis Methods: Subjective and Objective Evaluations Using Various Microphone Arrays in the Auralization of a Critical Listening Room33
The VoxCeleb Speaker Recognition Challenge: A Retrospective33
Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale33
Multi-Level Interaction Based Knowledge Graph Completion32
Dynamic Convolutional Neural Networks as Efficient Pre-Trained Audio Models31
Cacophony: An Improved Contrastive Audio-Text Model31
Dynamic Prompt-Driven Zero-Shot Relation Extraction31
Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service29
Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays29
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models29
STN4DST: A Scalable Dialogue State Tracking Based on Slot Tagging Navigation29
A Two-Stage Audio-Visual Fusion Piano Transcription Model Based on the Attention Mechanism28
Interrelate Training and Clustering for Online Speaker Diarization27
Syntax-Augmented Hierarchical Interactive Encoder for Zero-Shot Cross-Lingual Information Extraction26
Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors26
Low-Latency Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks26
Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition26
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information25
Harmonic Detection From Noisy Speech With Auditory Frame Gain for Intelligibility Enhancement25
WDEA: The Structure and Semantic Fusion With Wasserstein Distance for Low-Resource Language Entity Alignment25
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network24
An AST Structure Enhanced Decoder for Code Generation24
FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System24
Large-Scale Unsupervised Audio Pre-Training for Video-to-Speech Synthesis23
Occlusion Effect Cancellation in Headphones and Hearing Devices—The Sister of Active Noise Cancellation23
SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems23
CLAPSep: Leveraging Contrastive Pre-Trained Model for Multi-Modal Query-Conditioned Target Sound Extraction23
Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation23
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features23
Audio-Visual Based Online Multi-Source Separation22
Cross Domain Optimization for Speech Enhancement: Parallel or Cascade?22
Selective Listening by Synchronizing Speech With Lips22
Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications22
Task-Adaptive Feature Fusion for Generalized Few-Shot Relation Classification in an Open World Environment22
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications21
Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models21
$\mathcal {P}$owMix: A Versatile Regularizer for Multimodal Sentiment Analysis21
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging20
Hyperbolic Pre-Trained Language Model20
Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps20
Towards Generating Diverse Audio Captions via Adversarial Training19
Block-Based Perceptually Adaptive Sound Zones With Reproduction Error Constraints19
Specialized Mathematical Solving by a Step-By-Step Expression Chain Generation19
Improving Chinese Named Entity Recognition by Large-Scale Syntactic Dependency Graph19
Exploring Multi-Stage Information Interactions for Multi-Source Neural Machine Translation19
Multi-Channel to Multi-Channel Noise Reduction and Reverberant Speech Preservation in Time-Varying Acoustic Scenes for Binaural Reproduction19
An Interpretable Deep Mutual Information Curriculum Metric for a Robust and Generalized Speech Emotion Recognition System19
RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems18
Proper Error Estimation and Calibration for Attention-Based Encoder-Decoder Models18
MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation18
Iterative Semantic Transformer by Greedy Distillation for Community Question Answering18
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks18
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds18
Bayesian Estimation of PLDA in the Presence of Noisy Training Labels, With Applications to Speaker Verification18
MVT: Chinese NER Using Multi-View Transformer18
Adaptive Pre-Training and Collaborative Fine-Tuning: A Win-Win Strategy to Improve Review Analysis Tasks17
Retrieve-and-Edit Domain Adaptation for End2End Aspect Based Sentiment Analysis17
Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data17
The Harmonic Shift Algorithm for Efficient Multi-Pitch Detection17
Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis17
Measuring the Structural Complexity of Music: From Structural Segmentations to the Automatic Evaluation of Models for Music Generation16
General Robust Subband Adaptive Filtering: Algorithms and Applications16
Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification16
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning16
Improving Seq2Seq TTS Frontends With Transcribed Speech Audio16
NoiER: An Approach for Training More Reliable Fine-Tuned Downstream Task Models16
Inference Skipping for More Efficient Real-Time Speech Enhancement With Parallel RNNs16
Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy16
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation16
Grid-Based Decimation for Wavelet Transforms With Stably Invertible Implementation16
Analysis of the Frequency Interference in the Narrowband Active Noise Control System16
Improved Transformer With Multi-Head Dense Collaboration16
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria16
Online Phase Reconstruction via DNN-Based Phase Differences Estimation16
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model16
A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition16
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization15
Exploring the Role of Language Families for Building Indic Speech Synthesisers15
Rethinking Textual Adversarial Defense for Pre-Trained Language Models15
Multilingual Customized Keyword Spotting Using Similar-Pair Contrastive Learning15
Low-Rank Room Impulse Response Estimation15
Amplitude Matching for Multizone Sound Field Control15
Interpretable Multimodal Capsule Fusion15
EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations15
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks15
Joint Maximum Likelihood Estimation of Microphone Array Parameters for a Reverberant Single Source Scenario15
DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog14
A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem14
Enhanced Speaker-Aware Multi-Party Multi-Turn Dialogue Comprehension14
Exploring Interactive and Contrastive Relations for Nested Named Entity Recognition14
Sparsity-Promoting Affine Projection Algorithm With Periodically-Updated Gain Matrix and Its Performance Analysis14
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors14
Improvement of Accent Classification Models Through Grad-Transfer From Spectrograms and Gradient-Weighted Class Activation Mapping14
Transferable Latent of CNN-Based Selective Fixed-Filter Active Noise Control14
Uncertainty-Driven Knowledge Distillation for Language Model Compression14
DetTrans: A Lightweight Framework to Detect and Translate Noisy Inputs Simultaneously14
Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech14
Speaker Anonymization Using Orthogonal Householder Neural Network14
RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition14
Rotor Noise-Aware Noise Covariance Matrix Estimation for Unmanned Aerial Vehicle Audition14
DBSA-Net: Dual Branch Self-Attention Network for Underwater Acoustic Signal Denoising13
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification13
Decorrelation in Feedback Delay Networks13
Verification on Head-Related Transfer Functions of a Snowman Model Simulated Using the Finite-Difference Time-Domain Method13
Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN13
A Universal Filter Approximation of Edge Diffraction for Geometrical Acoustics13
Triple Alliance Prototype Orthotist Network for Long-Tailed Multi-Label Text Classification13
Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation12
Empathetic Response Generation Based on Plug-and-Play Mechanism With Empathy Perturbation12
Abstractive Financial News Summarization via Transformer-BiLSTM Encoder and Graph Attention-Based Decoder12
Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech12
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications12
Towards Comprehensive Subgroup Performance Analysis in Speech Models12
Disentanglement in a GAN for Unconditional Speech Synthesis12
CircularE: A Complex Space Circular Correlation Relational Model for Link Prediction in Knowledge Graph Embedding12
Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding12
Spatially Selective Speaker Separation Using a DNN With a Location Dependent Feature Extraction12
Decomposed Meta-Learning for Few-Shot Sequence Labeling12
Adjustable Coherent-to-Diffuse Power Estimator for Binaural Speech Enhancement in Multi-Talker Environments12
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition12
Attention-Based Speech Enhancement Using Human Quality Perception Modeling12
Zero-Note Samba: Self-Supervised Beat Tracking12
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement12
Acoustic Imaging With Circular Microphone Array: A New Approach for Sound Field Analysis12
A Novel Unsupervised Approach for Cross-Lingual Word Alignment in Low Isomorphic Embedding Spaces12
A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk12
Multi-Level Time-Frequency Bins Selection for Direction of Arrival Estimation Using a Single Acoustic Vector Sensor11
Learning Speech Emotion Representations in the Quaternion Domain11
FTDKD: Frequency-Time Domain Knowledge Distillation for Low-Quality Compressed Audio Deepfake Detection11
Acoustic Source Localization in the Circular Harmonic Domain Using Deep Learning Architecture11
Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement11
Kronecker Product Multichannel Linear Filtering for Adaptive Weighted Prediction Error-Based Speech Dereverberation11
Exploit Feature and Relation Hierarchy for Relation Extraction11
Generating Rational Commonsense Knowledge-Aware Dialogue Responses With Channel-Aware Knowledge Fusing Network11
Prompt-Based Prototypical Framework for Continual Relation Extraction11
En-HACN: Enhancing Hybrid Architecture With Fast Attention and Capsule Network for End-to-end Speech Recognition11
FastMVAE2: On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures11
Enhancing Multimodal Entity and Relation Extraction With Variational Information Bottleneck11
Enhancing Semantic Relation Classification With Shortest Dependency Path Reasoning11
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture11
Neural Fusion for Voice Cloning11
Affine-Projection-Like Maximum Correntropy Criteria Algorithm for Robust Active Noise Control11
Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform11
Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning11
AudioLM: A Language Modeling Approach to Audio Generation11
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space11
A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech10
When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends10
End-to-End Lip-Reading Without Large-Scale Data10
SIFTER: A Framework for Robust Rumor Detection10
Drone Audition: Sound Source Localization Using On-Board Microphones10
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis10
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity10
Learning Discriminative Representations and Decision Boundaries for Open Intent Detection10
Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems10
Document-Level Relation Extraction With Context Guided Mention Integration and Inter-Pair Reasoning10
IEEE Signal Processing Society Information10
Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis10
Gradformer: A Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment10
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition10
From LSAT: The Progress and Challenges of Complex Reasoning10
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement10
A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition10
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning10
Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using 1D CNN LSTM Networks10
Spatio-Temporal Bayesian Regression for Room Impulse Response Reconstruction With Spherical Waves9
KGAgent: Learning a Deep Reinforced Agent for Keyphrase Generation9
Leveraging ASR Pretrained Conformers for Speaker Verification Through Transfer Learning and Knowledge Distillation9
SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System9
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition9
Entity Resolution in Situated Dialog With Unimodal and Multimodal Transformers9
Phrase-Aware Financial Sentiment Analysis Based on Constituent Syntax9
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems9
Debiasing Counterfactual Context With Causal Inference for Multi-Turn Dialogue Reasoning9
Binaural-Projection Multichannel Wiener Filter for Cue-Preserving Binaural Speech Enhancement9
Unequally Spaced Sound Field Interpolation for Rotation-Robust Beamforming9
Enhancing Low-Resource NLP by Consistency Training With Data and Model Perturbations9
Design of Fully Steerable Differential Beamformers With Linear Superarrays9
Information Dropping Data Augmentation for Machine Translation Quality Estimation9
IEEE/ACM Transactions on Audio, Speech and Language Processing Publication Information9
MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer With One Transformer VAE9
Improving Mispronunciation Detection Using Speech Reconstruction9
FA-ExU-Net: The Simultaneous Training of an Embedding Extractor and Enhancement Model for a Speaker Verification System Robust to Short Noisy Utterances9
Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks9
Distributed Microphone Array Localization Problem via SDP-SOCP Method9
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR9
Multi-View Speech Emotion Recognition Via Collective Relation Construction9
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance8
Automated Data Augmentation for Audio Classification8
Interpretable Spectrum Transformation Attacks to Speaker Recognition Systems8
NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks8
ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding8
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation8
VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation8
Exploiting Inactive Examples for Natural Language Generation With Data Rejuvenation8
Convergence and Performance Analysis of Classical, Hybrid, and Deep Acoustic Echo Control8
Graph-Based Cross-Granularity Message Passing on Knowledge-Intensive Text8
Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation8
Selective-Memory Meta-Learning With Environment Representations for Sound Event Localization and Detection8
Integrated Syntactic and Semantic Tree for Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network8
Interference-Controlled Maximum Noise Reduction Beamformer Based on Deep-Learned Interference Manifold8
Data-Centric Methods for Environmental Sound Classification With Limited Labels8
Knowledge-Guided Transformer for Joint Theme and Emotion Classification of Chinese Classical Poetry8
On the Generalization Ability of Complex-Valued Variational U-Networks for Single-Channel Speech Enhancement8
DS-TDNN: Dual-Stream Time-Delay Neural Network With Global-Aware Filter for Speaker Verification8
Automatic Detection of Speech Sound Disorder in Cantonese-Speaking Pre-School Children8
Artist Similarity Based on Heterogeneous Graph Neural Networks8
Spherically Steerable Vector Differential Microphone Arrays8
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation8
DUMA: Reading Comprehension With Transposition Thinking8
Learning With an Open Horizon in Ever-Changing Dialogue Circumstances8
Fundamental Approaches to Robust Differential Beamforming With High Directivity Factors7
Multi-Task Attentive Residual Networks for Argument Mining7
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization7
StarSum: A Star Architecture Based Model for Extractive Summarization7
Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition7
Towards Robust Waveform-Based Acoustic Models7
Emotion Prediction Oriented Method With Multiple Supervisions for Emotion-Cause Pair Extraction7
Design of 2D and 3D Differential Microphone Arrays With a Multistage Framework7
Blind Identification of Binaural Room Impulse Responses From Smart Glasses7
A Class of Pareto Optimal Binaural Beamformers7
Weighted Loudspeaker Placement Method for Sound Field Reproduction7
Modularized Pre-Training for End-to-End Task-Oriented Dialogue7
Constant-Beamwidth Beamforming With Nonuniform Concentric Ring Arrays7
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing7
ISNet: Individual Standardization Network for Speech Emotion Recognition7
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning7
0.1106128692627