IEEE Transactions on Multimedia

Papers
(The H4-Index of IEEE Transactions on Multimedia is 68. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Focusing on Subtle Differences: A Feature Disentanglement Model for Series Photo Selection639
Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations361
Optimal Transport-Based Patch Matching for Image Style Transfer310
Adaptive Weight Generator for Multi-Task Image Recognition by Task Grouping Prompt234
Semi-Supervised Domain Adaptation via Joint Transductive and Inductive Subspace Learning229
Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention223
Disaggregation Distillation for Person Search204
Multi-Level Transitional Contrast Learning for Personalized Image Aesthetics Assessment201
Semantic-Aware Triplet Loss for Image Classification193
Robust Multi-stage Tracking via Multi-scale and Multi-level Representation Learning177
Improving Vision Anomaly Detection With the Guidance of Language Modality155
Towards Fast and Robust Real Image Denoising With Attentive Neural Network and PID Controller150
Self-Mining the Confident Prototypes for Source-Free Unsupervised Domain Adaptation in Image Segmentation147
Improving Pre-Trained Model-Based Speech Emotion Recognition From a Low-Level Speech Feature Perspective140
One-Shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing135
Quality Assessment for DIBR-Synthesized Views Based on Wavelet Transform and Gradient Magnitude Similarity131
Few-Shot Generative Model Adaptation via Style-Guided Prompt129
MHRN: A Multimodal Hierarchical Reasoning Network for Topic Detection124
Pixel Bleach Network for Detecting Face Forgery Under Compression123
Rethinking Affine Transform for Efficient Image Enhancement: A Color Space Perspective122
Bias-Correction Feature Learner for Semi-Supervised Instance Segmentation118
Asymptotics-Aware Multi-View Subspace Clustering117
Ensemble Prototype Networks for Unsupervised Cross-Modal Hashing With Cross-Task Consistency115
BMB: Balanced Memory Bank for Long-Tailed Semi-Supervised Learning114
MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion111
Semi-Supervised Domain Adaptation for Major Depressive Disorder Detection111
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments110
Annealing Genetic GAN for Imbalanced Web Data Learning110
Adaptive HEVC Video Steganography With High Performance Based on Attention-Net and PU Partition Modes110
Feature First: Advancing Image-Text Retrieval Through Improved Visual Features108
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework107
Deep Semantic-Consistent Penalizing Hashing for Cross-Modal Retrieval107
Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding103
Exploring Kernel Transformations for Implicit Neural Representations100
SkyML: A MLaaS Federation Design for Multicloud-Based Multimedia Analytics98
Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes96
ICE: Interactive 3D Game Character Facial Editing via Dialogue96
Online Low-Light Sand-Dust Video Enhancement Using Adaptive Dynamic Brightness Correction and a Rolling Guidance Filter94
Unsupervised Learning-Based Framework for Deepfake Video Detection92
Semi-Supervised Contrastive Learning With Similarity Co-Calibration91
Scale Up Composed Image Retrieval Learning via Modification Text Generation89
Weakly-Supervised 3D Visual Grounding based on Visual Language Alignment88
Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames87
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition87
Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization85
SCSP: An Unsupervised Image-to-Image Translation Network Based on Semantic Cooperative Shape Perception84
Siamese Alignment Network for Weakly Supervised Video Moment Retrieval84
Vulnerability of Feature Extractors in 2D Image-Based 3D Object Retrieval84
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation With Interpretability83
Interpretable Graph Convolutional Network for Multi-View Semi-Supervised Learning83
AMS-Net: Adaptive Multi-Scale Network for Image Compressive Sensing83
Bidirectional Translation Between UHD-HDR and HD-SDR Videos82
SGG-Nets: Generic Rotation-Invariant Plugin Networks for Point Cloud Analysis81
Neighborhood Contrastive Transformer for Change Captioning81
Structured Attention Network for Referring Image Segmentation80
A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition80
Progressive Local Filter Pruning for Image Retrieval Acceleration78
FoodSAM: Any Food Segmentation76
Skeleton-Based Action Recognition With Select-Assemble-Normalize Graph Convolutional Networks75
Hierarchical Equalization Loss for Long-Tailed Instance Segmentation75
Guided Image-to-Image Translation by Discriminator-Generator Communication75
A Total Variation With Joint Norms For Infrared and Visible Image Fusion75
Dynamic Contrastive Distillation for Image-Text Retrieval74
PhotoHelper: Portrait Photographing Guidance Via Deep Feature Retrieval and Fusion74
DREAMT: Diversity Enlarged Mutual Teaching for Unsupervised Domain Adaptive Person Re-Identification73
Cps-STS: Bridging the Gap Between Content and Position for Coarse-Point-Supervised Scene Text Spotter73
SLCGC: A lightweight Self-supervised Low-pass Contrastive Graph Clustering Network for Hyperspectral Images73
Unsupervised Image and Text Fusion for Travel Information Enhancement70
Benchmark Dataset and Pair-Wise Ranking Method for Quality Evaluation of Night-Time Image Enhancement68
Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training68
0.048407077789307