IEEE Transactions on Multimedia

Papers
(The H4-Index of IEEE Transactions on Multimedia is 66. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-07-01 to 2025-07-01.)
ArticleCitations
Self-Mining the Confident Prototypes for Source-Free Unsupervised Domain Adaptation in Image Segmentation577
Focusing on Subtle Differences: A Feature Disentanglement Model for Series Photo Selection330
Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations292
Optimal Transport-Based Patch Matching for Image Style Transfer216
Adaptive Weight Generator for Multi-Task Image Recognition by Task Grouping Prompt214
Semi-Supervised Domain Adaptation via Joint Transductive and Inductive Subspace Learning198
Improving Pre-Trained Model-Based Speech Emotion Recognition From a Low-Level Speech Feature Perspective189
Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention182
Disaggregation Distillation for Person Search165
Multi-Level Transitional Contrast Learning for Personalized Image Aesthetics Assessment164
Guided Image-to-Image Translation by Discriminator-Generator Communication151
One-shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing140
SGG-Nets: Generic Rotation-Invariant Plugin Networks for Point Cloud Analysis136
Siamese Alignment Network for Weakly Supervised Video Moment Retrieval126
Towards Fast and Robust Real Image Denoising With Attentive Neural Network and PID Controller126
PhotoHelper: Portrait Photographing Guidance Via Deep Feature Retrieval and Fusion125
Robust Multi-stage Tracking via Multi-scale and Multi-level Representation Learning112
Semantic-Aware Triplet Loss for Image Classification112
Improving Vision Anomaly Detection With the Guidance of Language Modality110
MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion109
Few-Shot Generative Model Adaptation via Style-Guided Prompt107
SCSP: An Unsupervised Image-to-Image Translation Network Based on Semantic Cooperative Shape Perception105
Rethinking Affine Transform for Efficient Image Enhancement: A Color Space Perspective104
Pixel Bleach Network for Detecting Face Forgery Under Compression104
MHRN: A Multimodal Hierarchical Reasoning Network for Topic Detection104
Dynamic Contrastive Distillation for Image-Text Retrieval103
Bias-Correction Feature Learner for Semi-Supervised Instance Segmentation103
Semi-Supervised Domain Adaptation for Major Depressive Disorder Detection102
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition101
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation With Interpretability100
Asymptotics-Aware Multi-View Subspace Clustering95
Unsupervised Learning-Based Framework for Deepfake Video Detection95
Ensemble Prototype Networks for Unsupervised Cross-Modal Hashing With Cross-Task Consistency94
BMB: Balanced Memory Bank for Long-Tailed Semi-Supervised Learning90
A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition90
Feature First: Advancing Image-Text Retrieval Through Improved Visual Features89
Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames85
Vulnerability of Feature Extractors in 2D Image-Based 3D Object Retrieval85
Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding84
Hierarchical Equalization Loss for Long-Tailed Instance Segmentation82
SkyML: A MLaaS Federation Design for Multicloud-Based Multimedia Analytics82
Exploring Kernel Transformations for Implicit Neural Representations82
Adaptive HEVC Video Steganography With High Performance Based on Attention-Net and PU Partition Modes78
Bidirectional Translation Between UHD-HDR and HD-SDR Videos77
ICE: Interactive 3D Game Character Facial Editing via Dialogue76
Structured Attention Network for Referring Image Segmentation75
Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes75
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments74
Semi-Supervised Contrastive Learning With Similarity Co-Calibration74
Online Low-Light Sand-Dust Video Enhancement Using Adaptive Dynamic Brightness Correction and a Rolling Guidance Filter74
Quality Assessment for DIBR-Synthesized Views Based on Wavelet Transform and Gradient Magnitude Similarity74
Deep Semantic-consistent Penalizing Hashing for Cross-modal Retrieval74
Progressive Local Filter Pruning for Image Retrieval Acceleration74
A Total Variation With Joint Norms For Infrared and Visible Image Fusion73
AMS-Net: Adaptive Multi-Scale Network for Image Compressive Sensing72
FoodSAM: Any Food Segmentation72
Annealing Genetic GAN for Imbalanced Web Data Learning71
Skeleton-Based Action Recognition With Select-Assemble-Normalize Graph Convolutional Networks70
Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization70
Neighborhood Contrastive Transformer for Change Captioning69
Interpretable Graph Convolutional Network for Multi-View Semi-Supervised Learning69
Cps-STS: Bridging the Gap Between Content and Position for Coarse-Point-Supervised Scene Text Spotter67
DREAMT: Diversity Enlarged Mutual Teaching for Unsupervised Domain Adaptive Person Re-Identification67
Benchmark Dataset and Pair-Wise Ranking Method for Quality Evaluation of Night-Time Image Enhancement66
Unsupervised Image and Text Fusion for Travel Information Enhancement66
Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training66
0.069747924804688