IEEE Transactions on Multimedia

Papers
(The TQCC of IEEE Transactions on Multimedia is 12. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Compact Latent Primitive Space Learning for Compositional Zero-Shot Learning483
Cluster Assumption-Guided Timestamp-Supervised Temporal Action Segmentation270
Uncertainty-Guided Diffusion Model for Camouflaged Object Detection261
Ensemble Prototype Networks for Unsupervised Cross-modal Hashing with Cross-Task Consistency198
CLIP-GAN: Stacking CLIPs and GAN for Efficient and Controllable Text-to-Image Synthesis190
Identity and Modality Attributes Driven Multimodal Fusion Networks for Emotion Recognition in Conversations188
WV-LUT: Wide Vision Lookup Tables for Real-Time Low-Light Image Enhancement162
Audio-Visual Collaborative Learning for Weakly Supervised Video Anomaly Detection157
Text2Avatar: Articulated 3D Avatar Creation with Text Instructions148
PRA-Det: Anchor-Free Oriented Object Detection With Polar Radius Representation143
Video Instance Segmentation Without Using Mask and Identity Supervision128
Learning From Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation127
High-Quality Reconstruction of Depth Maps From Graph-Based Non-Uniform Sampling124
PointMCD: Boosting Deep Point Cloud Encoders via Multi-View Cross-Modal Distillation for 3D Shape Recognition113
CrossNet: Cross-Scene Background Subtraction Network via 3D Optical Flow108
Multiple Adaptation Network for Multi-source and Multi-target Domain Adaptation106
An Efficient Ungrouped Mask Method With two Learnable Parameters for 3D Object Detection104
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation99
Adaptive Multi-Scale Language Reinforcement for Multimodal Named Entity Recognition98
DSAF: Dual Space Alignment Framework for Visible-Infrared Person Re-Identification96
Deep Mutual Distillation for Unsupervised Domain Adaptation Person Re-Identification96
Progressive Knowledge Distillation From Different Levels of Teachers for Online Action Detection93
Rethinking Depth Guided Reflection Removal93
Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-Resolution93
A Multi-Granularity Relation Graph Aggregation Framework with Multimodal Clues for Social Relation Reasoning93
Masked Attribute Description Embedding for Cloth-Changing Person Re-Identification92
Discriminative Anchor Learning for Efficient Multi-View Clustering89
Generalizable Prompt Learning via Gradient Constrained Sharpness-Aware Minimization88
Leveraging Enriched Skeleton Representation With Multi-Relational Metrics for Few-Shot Action Recognition87
Improving Vision Anomaly Detection With the Guidance of Language Modality86
GFTLS-SLT: Gloss-Free Transformer Based Lexical and Semantic Awareness Framework for Multimodal Sign Language Translation85
MIGN: Multiscale Image Generation Network for Remote Sensing Image Semantic Segmentation82
Robust Multimodal Sentiment Analysis via Tag Encoding of Uncertain Missing Modalities81
Lightweight Video-Based Respiration Rate Detection Algorithm: An Application Case on Intensive Care77
UCM-Net: A U-Net-Like Tampered-Region-Related Framework for Copy-Move Forgery Detection77
Combining Retargeting Quality and Depth Perception Measures for Quality Evaluation of Retargeted Stereopairs76
Semi-Supervised Domain Adaptation for Major Depressive Disorder Detection75
RSUIA: Dynamic No-Reference Underwater Image Assessment via Reinforcement Sequences74
ETC: Temporal Boundary Expand Then Clarify for Weakly Supervised Video Grounding With Multimodal Large Language Model73
Heterogeneous Domain Adaptation via Correlative and Discriminative Feature Learning72
One-shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing72
Deep Semantic-consistent Penalizing Hashing for Cross-modal Retrieval71
Snippet-inter Difference Attention Network for Weakly-supervised Temporal Action Localization70
Asymptotics-Aware Multi-View Subspace Clustering70
Hand Gesture Recognition from an Open-Set Perspective65
Learning Shape-Color Diffusion Priors for Text-Guided 3D Object Generation65
Multi-view Clustering via Multi-stage Fusion65
Rectangling for Stitched Image via Pixel-wise Deformation Learning65
DCPTalk: Speech-Driven 3D Face Animation With Personalized Facial Dynamic Coupling Properties64
BMB: Balanced Memory Bank for Long-Tailed Semi-Supervised Learning64
Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding62
List of Reviewers62
From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios62
IEEE Transactions on Multimedia Publication Information61
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge61
Dual-Domain Aligned Deep Hierarchical Matrix Factorization Method for Micro-Video Multi-Label Classification60
Provably Secure Robust Image Steganography60
Progressive Local Filter Pruning for Image Retrieval Acceleration59
SGG-Nets: Generic Rotation-Invariant Plugin Networks for Point Cloud Analysis59
Learning the Global Descriptor for 3-D Object Recognition Based on Multiple Views Decomposition59
Viscoelastic Cluster-constrained PBD-based Soft Tissue Behavior and Interactive Media Applications for Surgical Simulation57
Self-supervised Semantic Soft Label Learning Network for Deep Multi-view Clustering57
Prototype-Decomposed Knowledge Distillation for Learning Generalized Federated Representation56
Integration of Global and Local Knowledge for Foreground Enhancing in Weakly Supervised Temporal Action Localization56
Spatial-Temporal Action Localization With Hierarchical Self-Attention55
Visible-Infrared Person Re-Identification via Cross-Modality Interaction Transformer55
Adversarial Learning Guided Task Relatedness Refinement for Multi-Task Deep Learning55
Does Thermal Really Always Matter for RGB-T Salient Object Detection?54
Disentangled Multimodal Representation Learning for Recommendation54
Frame-Wise Cross-Modal Matching for Video Moment Retrieval52
A Boundary-Aware Network for Shadow Removal52
DDistill-SR: Reparameterized Dynamic Distillation Network for Lightweight Image Super-Resolution52
Context-Aware 3D Point Cloud Semantic Segmentation With Plane Guidance52
Reflection Removal With NIR and RGB Image Feature Fusion52
Dynamic Residual Filtering With Laplacian Pyramid for Instance Segmentation51
GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS Coordinates51
Scene-Text Oriented Referring Expression Comprehension51
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval50
Intra-Class Adaptive Augmentation With Neighbor Correction for Deep Metric Learning50
Towards Comprehensive Monocular Depth Estimation: Multiple Heads are Better Than One50
TSFNet: Triple-Steam Image Captioning50
Blind Image Quality Assessment Based on Perceptual Comparison49
Bal-R$^2$CNN: High Quality Recurrent Object Detection With Balance Optimization49
Category-Adaptive Label Discovery and Noise Rejection for Multi-Label Recognition With Partial Positive Labels49
List-Wise Rank Learning for Stereoscopic Image Retargeting Quality Assessment49
A Social Condition-Enhanced Network for Recognizing Power Distance Using Expressive Prosody and Intrinsic Brain Connectivity49
Learning Generalized Knowledge From a Single Domain on Urban-Scene Segmentation49
Gated SwitchGAN for Multi-Domain Facial Image Translation48
Live 360° Video Streaming to Heterogeneous Clients in 5G Networks48
Action Coherence Network for Weakly-Supervised Temporal Action Localization48
Dynamic View Aggregation for Multi-View 3D Shape Recognition48
Achieving the Optimum Rate for Cross-Modal Source Coding47
Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection46
Delving Into Important Samples of Semi-Supervised Old Photo Restoration: A New Dataset and Method46
Semantic-Enhanced Proxy-Guided Hashing for Long-Tailed Image Retrieval46
Contrastive JS: A Novel Scheme for Enhancing the Accuracy and Robustness of Deep Models46
From Front to Rear: 3D Semantic Scene Completion Through Planar Convolution and Attention-Based Network45
Iterative Network for Image Super-Resolution45
Graph-Based Visual-Semantic Entanglement Network for Zero-Shot Image Recognition45
Progressive Motion Boosting for Video Frame Interpolation45
Adaptive Weight Generator for Multi-Task Image Recognition by Task Grouping Prompt45
Debiased Mapping for Full-Reference Image Quality Assessment44
Robust Coverless Image Steganography Based on Neglected Coverless Image Dataset Construction44
Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting44
MLNet: A Multi-Domain Lightweight Network for Multi-Focus Image Fusion44
Reinforcement Learning for Logic Recipe Generation: Bridging Gaps From Images to Plans43
A Twist Representation and Shape Refinement Method for Human Mesh Recovery43
SkyML: A MLaaS Federation Design for Multicloud-based Multimedia Analytics43
HNR-ISC: Hybrid Neural Representation for Image Set Compression43
SVSRD: Spatial Visual and Statistical Relation Distillation for Class-Incremental Semantic Segmentation43
Semantic Image Segmentation by Dynamic Discriminative Prototypes43
StyleAM: Perception-Oriented Unsupervised Domain Adaption for No-reference Image Quality Assessment43
Multi-granularity Context Perception Network for Open Set Recognition of Camouflaged Objects42
Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network42
Editorial42
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser41
PointAttention: Rethinking Feature Representation and Propagation in Point Cloud41
Dynamic Contrastive Distillation for Image-Text Retrieval41
Content-Aware Tunable Selective Encryption for HEVC Using Sine-Modular Chaotification Model41
Transformer-Based High-Fidelity Facial Displacement Completion for Detailed 3D Face Reconstruction40
Weakly Supervised Instance Segmentation by Exploring Entire Object Regions40
DuPMAM: An Efficient Dual Perception Framework Equipped with a Sharp Testing Strategy for Point Cloud Analysis40
A Novel Video Stabilization Model With Motion Morphological Component Priors40
No-Reference Light Field Image Quality Assessment Using Four-Dimensional Sparse Transform40
MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion40
Multi-Perspective Pseudo-Label Generation and Confidence-Weighted Training for Semi-Supervised Semantic Segmentation40
A Semi-Fragile Reversible Watermarking for Authenticating 3D Models Based on Virtual Polygon Projection and Double Modulation Strategy39
Multi-Source Style Transfer via Style Disentanglement Network39
Causal Interventional Training for Image Recognition39
3D Holoscopic Image Compression Based on Gaussian Mixture Model39
Exploiting Low-Rank Latent Gaussian Graphical Model Estimation for Visual Sentiment Distributions39
Disaggregation Distillation for Person Search38
Decoder-Side Cross Resolution Synthesis for Video Compression Enhancement38
Cycle-Free Weakly Referring Expression Grounding With Self-Paced Learning38
Active Gradual Domain Adaptation: Dataset and Approach38
An Apprenticeship Learning Approach for Adaptive Video Streaming Based on Chunk Quality and User Preference37
Building Multimodal Knowledge Bases With Multimodal Computational Sequences and Generative Adversarial Networks37
Event-Based Low-Illumination Image Enhancement37
A Benchmark for Controllable Text -Image-to-Video Generation37
Multi-Level Transitional Contrast Learning for Personalized Image Aesthetics Assessment37
RFGAN: RF-Based Human Synthesis37
Guided Image-to-Image Translation by Discriminator-Generator Communication37
Background Scene Recovery From an Image Looking Through Colored Glass36
Image Stitching With Manifold Optimization35
ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation35
Progressive Bidirectional Feature Extraction and Enhancement Network for Quality Evaluation of Night-Time Images35
Personalized Representation With Contrastive Loss for Recommendation Systems35
Learning Monocular Regression of 3D People in Crowds via Scene-Aware Blending and De-Occlusion35
Zero-Shot Predicate Prediction for Scene Graph Parsing35
Boosting Robust Learning Via Leveraging Reusable Samples in Noisy Web Data35
Online Low-Light Sand-Dust Video Enhancement Using Adaptive Dynamic Brightness Correction and a Rolling Guidance Filter35
Coherent Image Animation Using Spatial-Temporal Correspondence34
Clicking Matters: Towards Interactive Human Parsing34
Pixel Bleach Network for Detecting Face Forgery Under Compression34
Self-Supervised Learning for Heterogeneous Audiovisual Scene Analysis34
Multi-Sentence Complementarily Generation for Text-to-Image Synthesis34
Cross-View Panorama Image Synthesis34
Timely and Accurate Bitrate Switching in HTTP Adaptive Streaming With Date-Driven I-Frame Prediction34
CLIPREC: Graph-Based Domain Adaptive Network for Zero-Shot Referring Expression Comprehension33
Multi-Source Multi-Label Learning for User Profiling in Online Games33
Beyond Word Embeddings: Heterogeneous Prior Knowledge Driven Multi-Label Image Classification33
Bio-Inspired Multi-Scale Contourlet Attention Networks33
Focal Stack Image Compression Based on Basis-Quadtree Representation33
$L_{1}$-Regularized Reconstruction Model for Edge-Preserving Filtering33
DMH-CL: Dynamic Model Hardness Based Curriculum Learning for Complex Pose Estimation33
Multi-Label Continual Learning Using Augmented Graph Convolutional Network33
Robust Geometry-Dependent Attack for 3D Point Clouds33
Temporal Attention-Pyramid Pooling for Temporal Action Detection33
Learning Representations by Contrastive Spatio-Temporal Clustering for Skeleton-Based Action Recognition33
Semantic-Aware Triplet Loss for Image Classification32
Post-Distillation via Neural Resuscitation32
Bi-RSTU: Bidirectional Recurrent Upsampling Network for Space-Time Video Super-Resolution32
SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and Audio-Visual Attention32
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures32
LA-HDR: Light Adaptive HDR Reconstruction Framework for Single LDR Image Considering Varied Light Conditions32
Neural Logic Vision Language Explainer32
Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality Contribution32
Image Aesthetics Assessment Based on Hypernetwork of Emotion Fusion32
RFMask: A Simple Baseline for Human Silhouette Segmentation With Radio Signals32
DASI: Learning Domain Adaptive Shape Impression for 3D Object Reconstruction31
Disguised Heterogeneous Face Generation With Iterative-Adversarial Style Unification31
Vulnerability of Feature Extractors in 2D Image-Based 3D Object Retrieval31
Location-Free Camouflage Generation Network31
Region Separable Stereo Matching31
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering31
Deconfounding Causal Inference for Zero-Shot Action Recognition31
Bidirectional Knowledge Reconfiguration for Lightweight Point Cloud Analysis31
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning31
Region-Aware Arbitrary-Shaped Text Detection With Progressive Fusion30
Conditional Consistency Regularization for Semi-Supervised Multi-Label Image Classification30
Self-Supervised Monocular Depth Estimation With Frequency-Based Recurrent Refinement30
Inter-Modal Masked Autoencoder for Self-Supervised Learning on Point Clouds30
Leveraging the Video-Level Semantic Consistency of Event for Audio-Visual Event Localization30
TIF: Threshold Interception and Fusion for Compact and Fine-Grained Visual Attribution30
Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality Token30
Exploiting Temporal Correlations for 3D Human Pose Estimation30
Learning Fashion Compatibility With Context Conditioning Embedding30
DBiased-P: Dual-Biased Predicate Predictor for Unbiased Scene Graph Generation30
Context-Patch Representation Learning With Adaptive Neighbor Embedding for Robust Face Image Super-Resolution30
A New Data Augmentation Method Based on Mixup and Dempster-Shafer Theory29
Flexible Alignment Super-Resolution Network for Multi-Contrast Magnetic Resonance Imaging29
Bilateral Fast Low-Rank Representation With Equivalent Transformation for Subspace Clustering29
TANet: Target Attention Network for Video Bit-Depth Enhancement29
Gait Recognition With Multi-Level Skeleton-Guided Refinement29
Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization29
Graph Contrastive Partial Multi-View Clustering29
Self-Supervised Fine-Grained Cycle-Separation Network (FSCN) for Visual-Audio Separation29
FoodSAM: Any Food Segmentation29
RZSR: Reference-Based Zero-Shot Super-Resolution With Depth Guided Self-Exemplars29
Towards Adaptive Multi-Scale Intermediate Domain via Progressive Training for Unsupervised Domain Adaptation28
CrowdCaption++: Collective-Guided Crowd Scenes Captioning28
SCSP: An Unsupervised Image-to-Image Translation Network Based on Semantic Cooperative Shape Perception28
JPEG Image Encryption With DC Rotation and Undivided RSV-Based AC Group Permutation28
Double-Domain Adaptation Semantics for Retrieval-Based Long-Term Visual Localization28
Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations28
DDAug: Differentiable Data Augmentation for Weakly Supervised Semantic Segmentation28
Deep Hashing Network With Hybrid Attention and Adaptive Weighting for Image Retrieval28
PPM-SEM: A Privacy-Preserving Mechanism for Sharing Electronic Patient Records and Medical Images in Telemedicine28
M$^{3}$ANet: Multi-Modal and Multi-Attention Fusion Network for Ship License Plate Recognition28
Domain Adaptive LiDAR Point Cloud Segmentation With 3D Spatial Consistency28
Bias-Correction Feature Learner for Semi-Supervised Instance Segmentation28
RV-TMO: Large-Scale Dataset for Subjective Quality Assessment of Tone Mapped Images28
Optimal Transport-Based Patch Matching for Image Style Transfer28
UniMF: A Unified Multimodal Framework for Multimodal Sentiment Analysis in Missing Modalities and Unaligned Multimodal Sequences28
CMCF-Net: An End-to-End Context Multiscale Cross-Fusion Network for Robust Copy-Move Forgery Detection28
CRADA: Cross Domain Object Detection With Cyclic Reconstruction and Decoupling Adaptation27
Cross-Modality Spatial-Temporal Transformer for Video-Based Visible-Infrared Person Re-Identification27
IRVR: A General Image Restoration Framework for Visual Recognition27
Meta Noise Adaption Framework for Multimodal Sentiment Analysis With Feature Noise27
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation With Interpretability27
Refining Uncertain Features With Self-Distillation for Face Recognition and Person Re-Identification27
Semi-Supervised Learning of Perceptual Video Quality by Generating Consistent Pairwise Pseudo-Ranks27
MHRN: A Multimodal Hierarchical Reasoning Network for Topic Detection27
BASICS: Broad Quality Assessment of Static Point Clouds in a Compression Scenario27
Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic Distribution27
DFR-Net: Density Feature Refinement Network for Image Dehazing Utilizing Haze Density Difference27
Quality Assessment for DIBR-Synthesized Views Based on Wavelet Transform and Gradient Magnitude Similarity27
Learning Label Semantics for Weakly Supervised Group Activity Recognition27
VideoXum: Cross-Modal Visual and Textural Summarization of Videos27
DCRP: Class-Aware Feature Diffusion Constraint and Reliable Pseudo-Labeling for Imbalanced Semi-Supervised Learning27
Learning With Imbalanced Noisy Data by Preventing Bias in Sample Selection27
Robust Saliency-Aware Distillation for Few-Shot Fine-Grained Visual Recognition27
A Progressive Placeholder Learning Network for Multimodal Zero-Shot Learning26
Incomplete Multi-View Clustering via Correntropy and Complement Consensus Learning26
TFRNet: Semantic Segmentation Network with Token Filtration and Refinement Method26
Few-Shot Generative Model Adaptation via Style-Guided Prompt26
SmartSit: Sitting Posture Recognition Through Acoustic Sensing on Smartphones26
Polarimetric Inverse Rendering for Transparent Shapes Reconstruction26
Non-Orthogonal Multiple Access Enhanced Scalable 360-Degree Video Multicast26
Focusing on Subtle Differences: A Feature Disentanglement Model for Series Photo Selection26
1.2291491031647