IEEE Transactions on Multimedia

Papers
(The H4-Index of IEEE Transactions on Multimedia is 59. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Compact Latent Primitive Space Learning for Compositional Zero-Shot Learning483
Cluster Assumption-Guided Timestamp-Supervised Temporal Action Segmentation270
Uncertainty-Guided Diffusion Model for Camouflaged Object Detection261
Ensemble Prototype Networks for Unsupervised Cross-modal Hashing with Cross-Task Consistency198
CLIP-GAN: Stacking CLIPs and GAN for Efficient and Controllable Text-to-Image Synthesis190
Identity and Modality Attributes Driven Multimodal Fusion Networks for Emotion Recognition in Conversations188
WV-LUT: Wide Vision Lookup Tables for Real-Time Low-Light Image Enhancement162
Audio-Visual Collaborative Learning for Weakly Supervised Video Anomaly Detection157
Text2Avatar: Articulated 3D Avatar Creation with Text Instructions148
PRA-Det: Anchor-Free Oriented Object Detection With Polar Radius Representation143
Video Instance Segmentation Without Using Mask and Identity Supervision128
Learning From Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation127
High-Quality Reconstruction of Depth Maps From Graph-Based Non-Uniform Sampling124
PointMCD: Boosting Deep Point Cloud Encoders via Multi-View Cross-Modal Distillation for 3D Shape Recognition113
CrossNet: Cross-Scene Background Subtraction Network via 3D Optical Flow108
Multiple Adaptation Network for Multi-source and Multi-target Domain Adaptation106
An Efficient Ungrouped Mask Method With two Learnable Parameters for 3D Object Detection104
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation99
Adaptive Multi-Scale Language Reinforcement for Multimodal Named Entity Recognition98
DSAF: Dual Space Alignment Framework for Visible-Infrared Person Re-Identification96
Deep Mutual Distillation for Unsupervised Domain Adaptation Person Re-Identification96
Progressive Knowledge Distillation From Different Levels of Teachers for Online Action Detection93
Rethinking Depth Guided Reflection Removal93
Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-Resolution93
A Multi-Granularity Relation Graph Aggregation Framework with Multimodal Clues for Social Relation Reasoning93
Masked Attribute Description Embedding for Cloth-Changing Person Re-Identification92
Discriminative Anchor Learning for Efficient Multi-View Clustering89
Generalizable Prompt Learning via Gradient Constrained Sharpness-Aware Minimization88
Leveraging Enriched Skeleton Representation With Multi-Relational Metrics for Few-Shot Action Recognition87
Improving Vision Anomaly Detection With the Guidance of Language Modality86
GFTLS-SLT: Gloss-Free Transformer Based Lexical and Semantic Awareness Framework for Multimodal Sign Language Translation85
MIGN: Multiscale Image Generation Network for Remote Sensing Image Semantic Segmentation82
Robust Multimodal Sentiment Analysis via Tag Encoding of Uncertain Missing Modalities81
UCM-Net: A U-Net-Like Tampered-Region-Related Framework for Copy-Move Forgery Detection77
Lightweight Video-Based Respiration Rate Detection Algorithm: An Application Case on Intensive Care77
Combining Retargeting Quality and Depth Perception Measures for Quality Evaluation of Retargeted Stereopairs76
Semi-Supervised Domain Adaptation for Major Depressive Disorder Detection75
RSUIA: Dynamic No-Reference Underwater Image Assessment via Reinforcement Sequences74
ETC: Temporal Boundary Expand Then Clarify for Weakly Supervised Video Grounding With Multimodal Large Language Model73
One-shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing72
Heterogeneous Domain Adaptation via Correlative and Discriminative Feature Learning72
Deep Semantic-consistent Penalizing Hashing for Cross-modal Retrieval71
Asymptotics-Aware Multi-View Subspace Clustering70
Snippet-inter Difference Attention Network for Weakly-supervised Temporal Action Localization70
Learning Shape-Color Diffusion Priors for Text-Guided 3D Object Generation65
Multi-view Clustering via Multi-stage Fusion65
Rectangling for Stitched Image via Pixel-wise Deformation Learning65
Hand Gesture Recognition from an Open-Set Perspective65
BMB: Balanced Memory Bank for Long-Tailed Semi-Supervised Learning64
DCPTalk: Speech-Driven 3D Face Animation With Personalized Facial Dynamic Coupling Properties64
Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding62
List of Reviewers62
From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios62
IEEE Transactions on Multimedia Publication Information61
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge61
Dual-Domain Aligned Deep Hierarchical Matrix Factorization Method for Micro-Video Multi-Label Classification60
Provably Secure Robust Image Steganography60
SGG-Nets: Generic Rotation-Invariant Plugin Networks for Point Cloud Analysis59
Learning the Global Descriptor for 3-D Object Recognition Based on Multiple Views Decomposition59
Progressive Local Filter Pruning for Image Retrieval Acceleration59
0.080543994903564