ACM Transactions on Multimedia Computing Communications and Applicatio

Papers
(The TQCC of ACM Transactions on Multimedia Computing Communications and Applicatio is 9. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Upsampling Algorithm for V-PCC-Coded 3D Point Clouds352
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation186
QuickCSGModeling: Quick CSG Operations Based on Fusing Signed Distance Fields for VR Modeling184
Image Cropping with Content and Composition Attribute-aware Global Relation Reasoning143
Hypercube Pooling for Visual Semantic Embedding136
Tensorial Evolutionary Optimization for Natural Image Matting122
Backdoor Two-Stream Video Models on Federated Learning98
Attentional Composition Networks for Long-Tailed Human Action Recognition97
Facial-expression-aware Emotional Color Transfer Based on Convolutional Neural Network92
Fine-Grained Text-to-Video Temporal Grounding from Coarse Boundary90
Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of Variation76
Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization74
Image-Based Personality Questionnaire Design71
Quantum Fourier Convolutional Network70
Reconstruction-Free Image Compression for Machine Vision via Knowledge Transfer67
Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis66
Towards Generalizable Deepfake Detection by Primary Region Regularization61
Semi-supervised Learning for Mars Imagery Classification and Segmentation60
HTTP Adaptive Streaming: A Review on Current Advances and Future Challenges59
CVLP-NaVD: Contrastive Visual-language Pre-training Models for Non-annotated Visual Description55
Rank-in-Rank Loss for Person Re-identification53
ForgeFinder: Perceptive Multimodal Deepfake Detection via Multi-grained Forgery Localization53
High Feature Distinguishability for Adaptive Image-text Matching with Dual-stream Transformers48
AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation47
BiC-Net: Learning Efficient Spatio-temporal Relation for Text-Video Retrieval45
Infrared and Visible Image Fusion via Text-Prior Guided Frequency-Domain Decomposition44
A Siamese Inverted Residuals Network Image Steganalysis Scheme based on Deep Learning43
Joint Mixing Data Augmentation for Skeleton-Based Action Recognition42
Towards Intelligent Attack Detection Using DNA Computing42
Enhanced Video Super-Resolution Network towards Compressed Data42
A Comprehensive Survey on Methods for Image Integrity42
Establishing Trust and Security in Decentralized Metaverse: A Web 3.0 Approach40
Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric40
Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos39
Category-Level Pose Estimation and Iterative Refinement for Monocular RGB-D Image36
JDAN: Joint Detection and Association Network for Real-Time Online Multi-Object Tracking36
Decoupling Deep Learning for Enhanced Image Recognition Interpretability35
New Metrics and Dataset for Biological Development Video Generation35
ViCoFace: Learning Disentangled Latent Motion Representations for Visual-Consistent Face Reenactment35
Universal Relocalizer for Weakly Supervised Referring Expression Grounding34
Efficient Light Field Image Compression with Enhanced Random Access34
SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection34
(Compress and Restore) N : A Robust Defense Against Adversarial Attacks on Image Classification33
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model33
GANonymization: A GAN-Based Face Anonymization Framework for Preserving Emotional Expressions33
Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval33
Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic Pooling32
Immersive Multimedia Service Caching in Edge Cloud with Renewable Energy32
CLOUD-CODEC : A New Way of Storing Traffic Camera Footage at Scale32
Boundary Attention-Guided Sparse Feature Learning for Underwater Object Tracking in Edge Computing31
Expanding-Window Zigzag Decodable Fountain Codes for Scalable Multimedia Transmission31
Using Four Hypothesis Probability Estimators for CABAC in Versatile Video Coding31
Visual-linguistic-stylistic Triple Reward for Cross-lingual Image Captioning31
Multi-spectral Class Center Network for Face Manipulation Localization31
The Price of Unlearning: Identifying Unlearning Risk in Edge Computing31
Domain-Aware Semantic Alignment Hashing for Large-Scale Zero-Shot Image Retrieval31
Robust Video Stabilization based on Motion Decomposition31
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition31
A Self-Defense Copyright Protection Scheme for NFT Image Art Based on Information Embedding30
LogoDet-3K: A Large-scale Image Dataset for Logo Detection30
Image Defogging Based on Regional Gradient Constrained Prior30
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding29
A Multi-feature and Time-aware-based Stress Evaluation Mechanism for Mental Status Adjustment28
A Multi-Task Adversarial Attack against Face Authentication28
ER-Depth: Enhancing the Robustness of Self-Supervised Monocular Depth Estimation in Challenging Scenes28
Light Field Reconstruction using Multi-orientation Epipolar Plane Images28
VISCOUNTH: A Large-scale Multilingual Visual Question Answering Dataset for Cultural Heritage27
Non-Acted Text and Keystrokes Database and Learning Methods to Recognize Emotions27
Detection of Moving Object Using Superpixel Fusion Network27
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval27
EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System27
Boosting Transferability of Adversarial Examples with Spatio-Temporal Context27
One-Bit Supervision for Image Classification: Problem, Solution, and Beyond26
DTSD: A Dual Teacher–Student-Based Discrimination Model for Anomaly Detection26
TEVL: Trilinear Encoder for Video-language Representation Learning26
GMS-3DQA: Projection-Based Grid Mini-patch Sampling for 3D Model Quality Assessment26
Source Information-Assisted UV-Space Transformation Network for Person Image Generation26
A Quality of Experience and Visual Attention Evaluation for 360° Videos with Non-spatial and Spatial Audio26
Principal Component Approximation Network for Image Compression25
Similarity Regulation and Calibration Alignment for Weakly Supervised Text-Based Person Re-Identification25
An Efficient and Accurate GPU-based Deep Learning Model for Multimedia Recommendation24
LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis24
Motion-Aware Self-Supervised RGBT Tracking with Multi-Modality Hierarchical Transformers24
Temporal Dynamic Concept Modeling Network for Explainable Video Event Recognition24
Human Selective Matting24
Counterfactual Scenario-relevant Knowledge-enriched Multi-modal Emotion Reasoning23
Adversarial Sample Synthesis for Visual Question Answering23
Authentication of LINE Chat History Files by Information Hiding23
Toward Egocentric Compositional Action Anticipation with Adaptive Semantic Debiasing23
DATRA-MIV: Decoder-Adaptive Tiling and Rate Allocation for MPEG Immersive Video23
Gleaning Wisdom from the Past: Towards Label Incremental Learning for Online Hashing with a Plug-and-Play Framework22
Melody Generation from Lyrics with Local Interpretability22
Reversible Data Hiding in Shared JPEG Images22
Temporal and Semantic Correlation Network for Weakly-Supervised Temporal Action Localization22
THMM-CLIP: Task-Guided Hierarchical Multi-Modal Alignment for Rehearsal-Free Class Incremental Learning22
Spotting the Fakes: A Deep Dive into GAN-Generated Face Detection22
Dual Alignment-enhanced Fashion Vision-Language Pre-training22
Enhancing Embedding Diversity and Robustness for Image-Text Retrieval in Remote Sensing22
Cyclic Self-attention for Point Cloud Recognition21
QoE Evaluation for VR with Vibrotactile Feedback Based on Inter-user Brain Spatial Information21
Multi-Grained Point Cloud Geometry Compression via Dual-Model Prediction with Extended Octree21
InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-Modal Sarcasm Detection21
SkiTrack: An Aerial Skiing Benchmark for Human-Centric Object Tracking20
Cross-modal Semantically Augmented Network for Image-text Matching20
DISA: Disentangled Dual-Branch Framework for Affordance-Aware Human Insertion20
Zero-shot Scene Graph Generation via Triplet Calibration and Reduction20
ATMNet: Adaptive Texture Migration Network for Guided Depth Super-Resolution20
Gloss-driven Conditional Diffusion Models for Sign Language Production20
Visual Security Index Combining CNN and Filter for Perceptually Encrypted Light Field Images20
Deep Chroma Compression of Tone-Mapped Images20
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation20
Pansharpening Scheme Using Bi-dimensional Empirical Mode Decomposition and Neural Network20
Diversity-Representativeness Replay and Knowledge Alignment for Lifelong Vehicle Re-identification19
Multigranularity Feature Aggregation and Cross-level Boundary Modeling for Temporal Action Detection19
Multiply Complementary Priors for Image Compressive Sensing Reconstruction in Impulsive Noise19
Text-Guided Synthesis of Masked Face Images19
Structure-aware Video Style Transfer with Map Art19
Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation19
CLIP-GS: CLIP-Informed Gaussian Splatting for View-Consistent 3D Indoor Semantic Understanding19
Maximizing Long-Term Task Completion Ratio of UAV-Enabled Wirelessly Powered MEC Systems19
Shot Boundary Detection Using Color Clustering and Attention Mechanism19
PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation19
Generative Image Steganography Based on Guidance Feature Distribution19
Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning18
Towards Integrating Image Encryption with Compression: A Survey18
Deep Modular Co-Attention Shifting Network for Multimodal Sentiment Analysis18
ReFID: Reciprocal Frequency-aware Generalizable Person Re-identification via Decomposition and Filtering18
Dynamic Transfer Exemplar based Facial Emotion Recognition Model Toward Online Video18
Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation18
Robust Unsupervised Gaze Calibration Using Conversation and Manipulation Attention Priors17
BiRe-ID: Binary Neural Network for Efficient Person Re-ID17
Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation17
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach17
A Comprehensive Study of Deep Learning-based Covert Communication17
Hyperbolic-based cross-modal semantic remodeling network for zero-shot sketch-based image retrieval17
Cross-Modality Relation and Uncertainty Exploration for Text-Based Person Search17
GLPose: Global-Local Representation Learning for Human Pose Estimation17
Attack-Defending Contrastive Learning for Volumetric Medical Image Zero-Watermarking17
Triplet Contrastive Representation Learning for Unsupervised Vehicle Re-Identification17
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition16
Fully Unsupervised Person Re-Identification via Selective Contrastive Learning16
Generating Robust Adversarial Examples against Online Social Networks (OSNs)16
Dual Dynamic Threshold Adjustment Strategy16
Content-Aware Selective Encryption for H.265/HEVC Using Deep Hashing Network and Steganography16
Multi-Task Driven Adapter-Based Foundation Model for Locomotion Prediction in Virtual Reality16
Potential Features Fusion Network for Multimodal Fake News Detection16
Unsupervised Domain Adaptation by Causal Learning for Biometric Signal-based HCI16
Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection16
Robust Image Hashing via CP Decomposition and DCT for Copy Detection16
CVAF: A CLIP-Based View-Consistent Alignment Framework for Aerial-Ground Person Re-Identification16
3D Facial Shape Similarity with Deep Perceptual Representations16
Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection16
Multi-view Shape Generation for a 3D Human-like Body16
NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank Minimization16
Generation and Editing of Mandrill Faces: Application to Sex Editing and Assessment15
A Normalized Slicing-assigned Virtualization Method for 6G-based Wireless Communication Systems15
Online Correction of Camera Poses for the Surround-view System: A Sparse Direct Approach15
Offloading-based Power-Efficient Mobile VTuber Live Streaming15
GAN-Assisted Road Segmentation from Satellite Imagery15
Skeleton-Aware Graph-Based Adversarial Networks for Human Pose Estimation from Sparse IMUs15
Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting15
Learning Nighttime Semantic Segmentation the Hard Way15
Robust and Secure Hashing Towards Pirated Neural Network Model Detection15
Semantic Completion and Filtration for Image–Text Retrieval15
Cascaded Adaptive Graph Representation Learning for Image Copy-Move Forgery Detection15
Self-supervised Calorie-aware Heterogeneous Graph Networks for Food Recommendation15
ProposalVLAD with Proposal-Intra Exploring for Temporal Action Proposal Generation15
Quality Assessment in the Era of Large Models: A Survey14
Multi-Modal Driven Pose-Controllable Talking Head Generation14
PrivaMod: Uncertainty-Aware Multimedia Fusion with Privacy Guarantees for NFT Visual and Transaction Analysis14
Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval14
Language-guided Visual Tracking: Comprehensive and Effective Multimodal Information Fusion14
Action-aware Linguistic Skeleton Optimization Network for Non-autoregressive Video Captioning14
Dynamic Weighted Gradient Reversal Network for Visible-infrared Person Re-identification14
Learning the User’s Deeper Preferences for Multi-modal Recommendation Systems14
Robust Long-Term Tracking via Localizing Occluders14
Transformer-Based Visual Grounding with Cross-Modality Interaction14
Toward High-quality Face-Mask Occluded Restoration14
Mimicking Individual Media Quality Perception with Neural Network based Artificial Observers14
EIN: Exposure-Induced Network for Single-Image HDR Reconstruction14
3DMambaComplete: Structured State Space Model for High-Efficiency Point Cloud Completion14
Autoregressive GAN for Semantic Unconditional Head Motion Generation14
Arbitrary Virtual Try-on Network: Characteristics Preservation and Tradeoff between Body and Clothing14
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications14
Joint Structure-Texture Scan-Order for Point Cloud Attribute Compression Using Affine Transformation14
Quality Enhancement of Compressed 360-Degree Videos Using Viewport-based Deep Neural Networks14
Progressive Transformer Machine for Natural Character Reenactment14
Privacy-preserving Multi-source Cross-domain Recommendation Based on Knowledge Graph14
Trans-Convo-Former Net for Hierarchical Prediction of Household Images14
Self-supervised Multi-view Learning via Auto-encoding 3D Transformations14
Multiscale Feature Importance-Based Bit Allocation for End-to-End Feature Coding for Machines14
Semantics and Non-fungible Tokens for Copyright Management on the Metaverse and Beyond14
SSAT: Active Authorization Control and User’s Fingerprint Tracking Framework for DNN IP Protection14
A Real-Time Medical Image Encryption Algorithm Leveraging a Novel Hypersensitive Chaotic Map14
Enhancing Pose-Guided Human Image Generation with Comprehensive and Adjustable 3D Control13
InteractNet: Social Interaction Recognition for Semantic-rich Videos13
Language-guided Residual Graph Attention Network and Data Augmentation for Visual Grounding13
ALOHA: Adapting Local Spatio-Temporal Context to Enhance the Audio-Visual Semantic Segmentation13
Noise-Resistance Learning via Multi-Granularity Consistency for Unsupervised Domain Adaptive Person Re-Identification13
GJFusion: A Channel-Level Correlation Construction Method for Multimodal Physiological Signal Fusion13
Portrait Video Compression with Semantic-guided Animation Model and Background Incremental Coding13
Dual Scene Graph Convolutional Network for Motivation Prediction13
Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective13
VRVul-Discovery: BiLSTM-based Vulnerability Discovery for Virtual Reality Devices in Metaverse13
EVASR: Edge-Based Salience-Aware Super-Resolution for Enhanced Video Quality and Power Efficiency13
Efficient Privacy-Preserving Video Analytics via Share Transforming in Distributed Clouds13
A Collaborative Hierarchical Aggregation Network for Weakly-Supervised Temporal Action Localization13
Balanced and Accurate Pseudo-Labels for Semi-Supervised Image Classification13
Hyperspectral Image Reconstruction Using Multi-scale Fusion Learning13
Boosting Few-shot Object Detection with Discriminative Representation and Class Margin12
Smart City Construction and Management by Digital Twins and BIM Big Data in COVID-19 Scenario12
Language-guided Bias Generation Contrastive Strategy for Visual Question Answering12
When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization12
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning12
T2C: Text-guided 4D Cloth Generation12
Learning Semantic Representation on Visual Attribute Graph for Person Re-identification and Beyond12
LFIZW-GRHFMR: Robust Zero-Watermarking with GRHFMR for Light Field Image12
Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors12
Generating and Evaluating Data of Daily Activities with an Autonomous Agent in a Virtual Smart Home12
Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge Computing12
A Review of Player Engagement Estimation in Video Games: Challenges and Opportunities12
DPDFormer: A Coarse-to-Fine Model for Monocular Depth Estimation12
A Hierarchically Discriminative Loss with Group Regularization for Fine-Grained Image Classification12
Multi-Scale and Multi-Layer Lattice Transformer for Underwater Image Enhancement12
FAST: Flexibly Controllable Arbitrary Style Transfer via Latent Diffusion Models12
A Convolutional Neural Network Model Using Weighted Loss Function to Detect Diabetic Retinopathy12
Boolean-based Two-in-One Secret Image Sharing by Adaptive Pixel Grouping12
FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification12
A Multimodal Hierarchical Attentional Ordering Network12
iDAM: Iteratively Trained Deep In-loop Filter with Adaptive Model Selection12
Multimodal Cascaded Framework with Multimodal Latent Loss Functions Robust to Missing Modalities12
PMAL: A Proxy Model Active Learning Approach for Vision Based Industrial Applications11
SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection11
Meetor: A Human-Centered Automatic Video Editing System for Meeting Recordings11
Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept11
Hierarchical and Progressive Image Matting11
Beyond the Parts: Learning Coarse-to-Fine Adaptive Alignment Representation for Person Search11
How to Understand Named Entities: Using Commonsense for News Captioning11
Deep Differential Lifelong Cross-modal Hashing for Stream Medical Data Retrieval11
Pose- and Attribute-consistent Person Image Synthesis11
Context-Based Novel Histogram Bin Stretching Algorithm for Automatic Contrast Enhancement11
S 2 CL-Leaf Net : Recognizing Leaf Images Like Human Botanists11
Dual-Modality-Shared Learning and Label Refinement for Unsupervised Visible-Infrared Person ReID11
AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval11
Instance-level Adversarial Source-free Domain Adaptive Person Re-identification11
Early Traffic Accident Anticipation via Feature Consistency Representation and Soft Label Regression11
Complementary Feature Pyramid Network for Object Detection11
Self-supervised Image-based 3D Model Retrieval11
MLIC ++ : Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression11
Compressed Point Cloud Quality Index by Combining Global Appearance and Local Details11
0.27977705001831