ACM Transactions on Multimedia Computing Communications and Applicatio

Papers
(The TQCC of ACM Transactions on Multimedia Computing Communications and Applicatio is 6. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Unleashing Creativity in the Metaverse: Generative AI and Multimodal Content231
Spatio-Temporal Attention for Text-Video Retrieval144
A 360-degree Video Player for Dynamic Video Editing Applications120
Counterfeiting Attacks on a RDH-EI scheme based on block-permutation and Co-XOR106
FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification98
Meta-Review on Brain-Computer Interface (BCI) in the Metaverse97
Meta-Review of Wearable Devices for Healthcare in the Metaverse84
Deep Differential Lifelong Cross-modal Hashing for Stream Medical Data Retrieval79
MLIC ++ : Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression76
Mathematics-Inspired Models: A Green and Interpretable Learning Paradigm for Multimedia Computing71
Fine-Grained Alignment Network for Zero-Shot Cross-Modal Retrieval70
Domain-Separated Bottleneck Attention Fusion Framework for Multimodal Emotion Recognition61
Joint Mixing Data Augmentation for Skeleton-Based Action Recognition61
Generative Adversarial Networks with Learnable Auxiliary Module for Image Synthesis60
Leveraging Frame- and Feature-level Progressive Augmentation for Semi-supervised Action Recognition58
High Fidelity Makeup via 2D and 3D Identity Preservation Net57
Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric56
Optimized Deep-Neural Network for Content-based Medical Image Retrieval in a Brownfield IoMT Network55
Multi-Grained Alignment with Knowledge Distillation for Partially Relevant Video Retrieval50
Residual-guided In-loop Filter Using Convolution Neural Network50
Perceptual Hashing of Deep Convolutional Neural Networks for Model Copy Detection48
Self-supervised Image-based 3D Model Retrieval46
Frequency-aware Camouflaged Object Detection43
Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network42
A Novel Reversible Data Hiding Scheme Based on Pixel-Residual Histogram41
Asymmetric Dual-Decoder U-Net for Joint Rain and Haze Removal41
An Optimal Edge-weighted Graph Semantic Correlation Framework for Multi-view Feature Representation Learning41
Diversely-Supervised Visual Product Search40
Table of Contents: Online Supplement Volume 17, Number 2s-3s40
Disentangle Saliency Detection into Cascaded Detail Modeling and Body Filling40
CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction Metric39
A Multimodal Hierarchical Attentional Ordering Network36
Animating Still Natural Images Using Warping36
Multi-Grained Contrastive Learning for Text-Supervised Open-Vocabulary Semantic Segmentation36
Context-Assisted Active Learning for Weakly Supervised Person Search35
LFIZW-GRHFMR: Robust Zero-Watermarking with GRHFMR for Light Field Image33
Attention-based Fusion for Stroke Lesion Segmentation on Computed Tomography Perfusion Data33
A Review of Player Engagement Estimation in Video Games: Challenges and Opportunities32
Towards Energy-efficient Audio-Visual Classification via Multimodal Interactive Spiking Neural Network32
Image Cropping with Content and Composition Attribute-aware Global Relation Reasoning32
Texture and Structure-Guided Dual-Attention Mechanism for Image Inpainting31
UVC: A Unified Deep Video Compression Framework30
Local-Aware Residual Attention Vision Transformer for Visible-Infrared Person Re-Identification30
Incomplete Cross-Modal Retrieval with Deep Correlation Transfer30
Dual-Modality-Shared Learning and Label Refinement for Unsupervised Visible-Infrared Person ReID29
TripRes29
Image-Based Personality Questionnaire Design28
Robust Searching-Based Gradient Collaborative Management in Intelligent Transportation System28
BMIF: Privacy-preserving Blockchain-based Medical Image Fusion28
PMAL: A Proxy Model Active Learning Approach for Vision Based Industrial Applications26
Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization26
Rank-in-Rank Loss for Person Re-identification26
Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning25
Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning25
Deep Saliency Mapping for 3D Meshes and Applications25
Fine-Grained Text-to-Video Temporal Grounding from Coarse Boundary25
LiLTv2: Language-substitutable Layout-image Transformer for Visual Information Extraction23
Leveraging Deep Statistics for Underwater Image Enhancement23
Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of Variation23
Machine Learning Based Content-Agnostic Viewport Prediction for 360-Degree Video23
Realtime Recognition of Dynamic Hand Gestures in Practical Applications22
Online Cross-modal Hashing With Dynamic Prototype22
QuickCSGModeling: Quick CSG Operations Based on Fusing Signed Distance Fields for VR Modeling22
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding22
Semi-supervised Learning for Mars Imagery Classification and Segmentation21
Facial-expression-aware Emotional Color Transfer Based on Convolutional Neural Network21
On Content-Aware Post-Processing: Adapting Statistically Learned Models to Dynamic Content21
Beyond the Parts: Learning Coarse-to-Fine Adaptive Alignment Representation for Person Search21
Incomplete Multiview Clustering via Semidiscrete Optimal Transport for Multimedia Data Mining in IoT21
DPDFormer: A Coarse-to-Fine Model for Monocular Depth Estimation20
A Comprehensive Survey on Methods for Image Integrity20
Attentional Composition Networks for Long-Tailed Human Action Recognition19
Context-Based Novel Histogram Bin Stretching Algorithm for Automatic Contrast Enhancement19
Learning Semantic Representation on Visual Attribute Graph for Person Re-identification and Beyond19
Asymmetric Deformable Spatio-temporal Framework for Infrared Object Tracking19
Instance-Based Continual Learning: A Real-World Dataset and Baseline for Fresh Recognition18
JDAN: Joint Detection and Association Network for Real-Time Online Multi-Object Tracking18
A Hierarchically Discriminative Loss with Group Regularization for Fine-Grained Image Classification18
Cross-Modal Contrastive Learning with a Style-Mixed Bridge for Single Image 3D Shape Retrieval18
Interactive Search vs. Automatic Search18
Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text Detection18
Lightweight Food Recognition via Aggregation Block and Feature Encoding18
Blind 3D Video Stabilization with Spatio-Temporally Varying Motion Blur18
DiRaC-I: Identifying Diverse and Rare Training Classes for Zero-Shot Learning18
Disentangling Features for Fashion Recommendation18
Adaptive Cloud VR Gaming Optimized by Gamer QoE Models18
On Modality Bias Recognition and Reduction18
SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection17
ProtoRefine: Enhancing Prototypes with Similar Structure in Few-Shot Learning17
Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC Systems17
Hypercube Pooling for Visual Semantic Embedding17
Occupancy Map Guided Attributes Artifacts Removal for Video-Based Point Cloud Compression17
CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-Identification17
Interactive Garment Recommendation with User in the Loop17
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation17
Improving Scene Text Retrieval via Stylized Middle Modality17
Domain-aware Multimodal Dialog Systems with Distribution-based User Characteristic Modeling17
Adaptive Prediction Structure for Learned Video Compression17
Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos16
DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing Images16
Music2Dance: DanceNet for Music-Driven Dance Generation16
Multi-Scale and Multi-Layer Lattice Transformer for Underwater Image Enhancement16
Multi-granularity Brushstrokes Network for Universal Style Transfer16
Reconstruction-Free Image Compression for Machine Vision via Knowledge Transfer16
Multimodal Cascaded Framework with Multimodal Latent Loss Functions Robust to Missing Modalities16
Voice-Face Homogeneity Tells Deepfake16
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning16
A Multi-Level Consistency Network for High-Fidelity Virtual Try-On16
PLACE Dropout: A Progressive Layer-wise and Channel-wise Dropout for Domain Generalization16
Deep Variational Learning for 360° Adaptive Streaming15
Meta-learning Advisor Networks for Long-tail and Noisy Labels in Social Image Classification15
Tensor-Empowered LSTM for Communication-Efficient and Privacy-Enhanced Cognitive Federated Learning in Intelligent Transportation Systems15
Graph Attention Transformer Network for Multi-label Image Classification15
Collocated Clothing Synthesis with GANs Aided by Textual Information: A Multi-Modal Framework15
Tensorial Evolutionary Optimization for Natural Image Matting15
Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation15
Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments15
Less Is More: Learning from Synthetic Data with Fine-Grained Attributes for Person Re-Identification15
UID2021: An Underwater Image Dataset for Evaluation of No-Reference Quality Assessment Metrics14
Context-Aware 3D Points of Interest Detection via Spatial Attention Mechanism14
Deep Learning for Logo Detection: A Survey14
Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis14
Exploration of Speech and Music Information for Movie Genre Classification14
Two-Stage Perceptual Quality Oriented Rate Control Algorithm for HEVC14
Deep Learning Based Occluded Person Re-Identification: A Survey14
Forgery Detection by Weighted Complementarity between Significant Invariance and Detail Enhancement14
DBGAN: Dual Branch Generative Adversarial Network for Multi-Modal MRI Translation14
An Image Arbitrary-Scale Super-Resolution Network Using Frequency-domain Information14
Delay Threshold for Social Interaction in Volumetric eXtended Reality Communication13
Performance Evaluation in Multimedia Retrieval13
Hiding Message Using a Cycle Generative Adversarial Network13
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding13
Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge Computing13
Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition13
An Image Privacy Protection Algorithm Based on Adversarial Perturbation Generative Networks13
Hierarchical and Progressive Image Matting13
Multimodal Neurosymbolic Approach for Explainable Deepfake Detection13
Instance-level Adversarial Source-free Domain Adaptive Person Re-identification13
Establishing Trust and Security in Decentralized Metaverse: A Web 3.0 Approach13
CVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual Description13
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?13
Enhanced Video Super-Resolution Network towards Compressed Data13
KF-VTON: Keypoints-Driven Flow Based Virtual Try-On Network13
AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval12
Characters Link Shots: Character Attention Network for Movie Scene Segmentation12
AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects12
MS-GDA: Improving Heterogeneous Recipe Representation via Multinomial Sampling Graph Data Augmentation12
Knowledge-integrated Multi-modal Movie Turning Point Identification12
CMAF: Cross-Modal Augmentation via Fusion for Underwater Acoustic Image Recognition12
Exploring Neighbor Correspondence Matching for Multiple-hypotheses Video Frame Synthesis12
Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video12
Exploiting Backdoors of Face Synthesis Detection with Natural Triggers12
Complementary Feature Pyramid Network for Object Detection12
Feature Extraction Matters More: An Effective and Efficient Universal Deepfake Disruptor12
Rethinking Feature Mining for Light Field Salient Object Detection12
Perceptual Quality Assessment of Omnidirectional Images: A Benchmark and Computational Model12
Compressed Point Cloud Quality Index by Combining Global Appearance and Local Details11
Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval11
AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation11
iDAM: Iteratively Trained Deep In-loop Filter with Adaptive Model Selection11
Boolean-based Two-in-One Secret Image Sharing by Adaptive Pixel Grouping11
COBIRAS: Offering a Continuous Bit Rate Slide to Maximize DASH Streaming Bandwidth Utilization11
Detection of Adversarial Facial Accessory Presentation Attacks Using Local Face Differential11
Inner Knowledge-based Img2Doc Scheme for Visual Question Answering11
Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling11
Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach11
Backdoor Two-Stream Video Models on Federated Learning11
A Quality-Aware and Obfuscation-Based Data Collection Scheme for Cyber-Physical Metaverse Systems11
Pose- and Attribute-consistent Person Image Synthesis11
Y-Net: Dual-branch Joint Network for Semantic Segmentation11
Unsupervised Domain Expansion for Visual Categorization11
Joint Source-Channel Decoding of Polar Codes for HEVC-Based Video Streaming11
When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization10
Meta-MMFNet: Meta-learning-based Multi-model Fusion Network for Micro-expression Recognition10
Variational Autoencoder with CCA for Audio–Visual Cross-modal Retrieval10
Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept10
Multi-Source Knowledge Reasoning Graph Network for Multi-Modal Commonsense Inference10
Moment is Important: Language-Based Video Moment Retrieval via Adversarial Learning10
Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach10
Smart Director: An Event-Driven Directing System for Live Broadcasting10
BiC-Net: Learning Efficient Spatio-temporal Relation for Text-Video Retrieval10
Category-Level Pose Estimation and Iterative Refinement for Monocular RGB-D Image10
Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection10
Quantum Fourier Convolutional Network10
TinyPredNet: A Lightweight Framework for Satellite Image Sequence Prediction10
A Siamese Inverted Residuals Network Image Steganalysis Scheme based on Deep Learning10
Towards Intelligent Attack Detection Using DNA Computing10
Upsampling Algorithm for V-PCC-Coded 3D Point Clouds10
HCNCT: A Cross-chain Interaction Scheme for the Blockchain-based Metaverse9
The Interpretable and Transferable Adversarial Attack Against Synthetic Speech Detectors9
Effective Video Summarization by Extracting Parameter-Free Motion Attention9
Efficient Light Field Image Compression with Enhanced Random Access9
Detection of Moving Object Using Superpixel Fusion Network9
A Survey on Composed Image Retrieval9
Computational Analysis of Degradation Modeling in Blind Panoramic Image Quality Assessment9
SSR-Net: A Spatial Structural Relation Network for Vehicle Re-identification9
Robust Hashing with Deep Features and Meixner Moments for Image Copy Detection9
Lightweight Multi-party Authentication and Key Agreement Protocol in IoT-based E-Healthcare Service9
Age-Invariant Face Recognition by Multi-Feature Fusionand Decomposition with Self-attention9
A DNA Based Colour Image Encryption Scheme Using A Convolutional Autoencoder9
Introduction to the Special Issue on Fine-Grained Visual Recognition and Re-Identification9
Semi-Supervised RGB-D Hand Gesture Recognition via Mutual Learning of Self-Supervised Models9
Fine-grained Semantic Disentanglement Network for Multimodal Sarcasm Analysis9
Multi-Modal Sarcasm Detection via Knowledge-aware Focused Graph Convolutional Networks9
Deep Learning-based Smart Predictive Evaluation for Interactive Multimedia-enabled Smart Healthcare9
Double Attention Based on Graph Attention Network for Image Multi-Label Classification9
Identity Feature Disentanglement for Visible-Infrared Person Re-Identification9
FasterPose: A Faster Simple Baseline for Human Pose Estimation9
Mastering Deepfake Detection: A Cutting-edge Approach to Distinguish GAN and Diffusion-model Images9
Paying Attention to Vehicles: A Systematic Review on Transformer-Based Vehicle Re-Identification9
Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification9
Multi-Scale Dynamic Fusion for Visible-Infrared Person Re-Identification9
Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition9
Explainable AI: A Multispectral Palm-Vein Identification System with New Augmentation Features9
RDH-DES: Reversible Data Hiding over Distributed Encrypted-Image Servers Based on Secret Sharing9
Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders9
Meetor: A Human-Centered Automatic Video Editing System for Meeting Recordings9
Beyond Songs: Analyzing User Sentiment through Music Playlists and Multimodal Data8
Modeling Long-range Dependencies and Epipolar Geometry for Multi-view Stereo8
Distributed Gateway Selection for Video Streaming in VANET Using IP Multicast8
Bottom-up and Top-down Object Inference Networks for Image Captioning8
Causal Inference with Knowledge Distilling and Curriculum Learning for Unbiased VQA8
Dynamic Weighted Adversarial Learning for Semi-Supervised Classification under Intersectional Class Mismatch8
EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System8
Graph Pooling Inference Network for Text-based VQA8
SOEDiff: Efficient Distillation for Small Object Editing8
GraSP: Local Grassmannian Spatio-Temporal Patterns for Unsupervised Pose Sequence Recognition8
MMSUM Digital Twins: A Multi-view Multi-modality Summarization Framework for Sporting Events8
An Explainable Deep Learning Ensemble Model for Robust Diagnosis of Diabetic Retinopathy Grading8
Multi-Anchor Offset Representation based Coarse-to-Fine Diffusion Model for Human Pose Estimation8
Attention-guided Multi-modality Interaction Network for RGB-D Salient Object Detection8
Perceptual Quality Assessment of Low-light Image Enhancement8
Precise No-Reference Image Quality Evaluation Based on Distortion Identification8
DRL based Joint Affective Services Computing and Resource Allocation in ISTN8
(Compress and Restore) N : A Robust Defense Against Adversarial Attacks on Image Classification8
On Teaching Mode of MTI Translation Workshop Based on IPT Corpus for Tibetan Areas of China8
Mix-Modality Person Re-Identification: A New and Practical Paradigm8
Style-FG: a style-based framework for film grain analysis and synthesis8
Hypergraph Association Weakly Supervised Crowd Counting8
A Deep Multi-level Attentive Network for Multimodal Sentiment Analysis8
Using Four Hypothesis Probability Estimators for CABAC in Versatile Video Coding8
Medical Image Classification based on an Adaptive Size Deep Learning Model8
VISCOUNTH: A Large-scale Multilingual Visual Question Answering Dataset for Cultural Heritage8
Response Generation by Jointly Modeling Personalized Linguistic Styles and Emotions8
Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration7
Multi-Scale Feature Attention Fusion for Image Splicing Forgery Detection7
Person in Uniforms Re-Identification7
Audio-visual Saliency Prediction Model with Implicit Neural Representation7
Blockchain-Based Audio Watermarking Technique for Multimedia Copyright Protection in Distribution Networks7
An Augmented Reality Online Assistance Platform for Repair Tasks7
0.10348010063171