ACM Transactions on Multimedia Computing Communications and Applicatio

Papers
(The TQCC of ACM Transactions on Multimedia Computing Communications and Applicatio is 8. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)
ArticleCitations
Fine-Grained Text-to-Video Temporal Grounding from Coarse Boundary293
Facial-expression-aware Emotional Color Transfer Based on Convolutional Neural Network164
Explainable AI: A Multispectral Palm-Vein Identification System with New Augmentation Features129
Smart Director: An Event-Driven Directing System for Live Broadcasting124
Quantum Fourier Convolutional Network108
Towards Intelligent Attack Detection Using DNA Computing107
Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of Variation98
Image Cropping with Content and Composition Attribute-aware Global Relation Reasoning85
Hypercube Pooling for Visual Semantic Embedding77
AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation74
QuickCSGModeling: Quick CSG Operations Based on Fusing Signed Distance Fields for VR Modeling73
Upsampling Algorithm for V-PCC-Coded 3D Point Clouds73
Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization65
Category-Level Pose Estimation and Iterative Refinement for Monocular RGB-D Image65
BiC-Net: Learning Efficient Spatio-temporal Relation for Text-Video Retrieval65
Tensorial Evolutionary Optimization for Natural Image Matting64
Backdoor Two-Stream Video Models on Federated Learning60
Attentional Composition Networks for Long-Tailed Human Action Recognition58
Joint Mixing Data Augmentation for Skeleton-Based Action Recognition51
Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos47
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation46
Infrared and Visible Image Fusion via Text-Prior Guided Frequency-Domain Decomposition45
Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis43
A Siamese Inverted Residuals Network Image Steganalysis Scheme based on Deep Learning43
Rank-in-Rank Loss for Person Re-identification42
Unsupervised Domain Expansion for Visual Categorization40
Enhanced Video Super-Resolution Network towards Compressed Data39
A Comprehensive Survey on Methods for Image Integrity38
Image-Based Personality Questionnaire Design37
Establishing Trust and Security in Decentralized Metaverse: A Web 3.0 Approach37
Semi-supervised Learning for Mars Imagery Classification and Segmentation37
CVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual Description36
HTTP Adaptive Streaming: A Review on Current Advances and Future Challenges35
JDAN: Joint Detection and Association Network for Real-Time Online Multi-Object Tracking35
Reconstruction-Free Image Compression for Machine Vision via Knowledge Transfer33
Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric31
Leveraging Deep Statistics for Underwater Image Enhancement29
Image Defogging Based on Regional Gradient Constrained Prior29
Visual-linguistic-stylistic Triple Reward for Cross-lingual Image Captioning29
A Multi-feature and Time-aware-based Stress Evaluation Mechanism for Mental Status Adjustment29
Domain-Aware Semantic Alignment Hashing for Large-Scale Zero-Shot Image Retrieval29
Robust Video Stabilization based on Motion Decomposition29
Detection of Moving Object Using Superpixel Fusion Network29
A Self-Defense Copyright Protection Scheme for NFT Image Art Based on Information Embedding28
Immersive Multimedia Service Caching in Edge Cloud with Renewable Energy28
(Compress and Restore) N : A Robust Defense Against Adversarial Attacks on Image Classification28
Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic Pooling27
Expanding-Window Zigzag Decodable Fountain Codes for Scalable Multimedia Transmission27
A Multi-Task Adversarial Attack against Face Authentication27
Efficient Light Field Image Compression with Enhanced Random Access27
Universal Relocalizer for Weakly Supervised Referring Expression Grounding27
Using Four Hypothesis Probability Estimators for CABAC in Versatile Video Coding27
A Quality of Experience and Visual Attention Evaluation for 360° Videos with Non-spatial and Spatial Audio27
Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval26
Decoupling Deep Learning for Enhanced Image Recognition Interpretability26
New Metrics and Dataset for Biological Development Video Generation26
ViCoFace: Learning Disentangled Latent Motion Representations for Visual-Consistent Face Reenactment25
VISCOUNTH: A Large-scale Multilingual Visual Question Answering Dataset for Cultural Heritage24
SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection23
Multi-spectral Class Center Network for Face Manipulation Localization23
GANonymization: A GAN-Based Face Anonymization Framework for Preserving Emotional Expressions23
ER-Depth: Enhancing the Robustness of Self-Supervised Monocular Depth Estimation in Challenging Scenes23
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval22
Boundary Attention Guided Sparse Feature Learning for Underwater Object Tracking in Edge Computing22
Precise No-Reference Image Quality Evaluation Based on Distortion Identification22
EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System22
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model22
The Price of Unlearning: Identifying Unlearning Risk in Edge Computing22
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition22
GMS-3DQA: Projection-Based Grid Mini-patch Sampling for 3D Model Quality Assessment21
LogoDet-3K: A Large-scale Image Dataset for Logo Detection21
Gloss-driven Conditional Diffusion Models for Sign Language Production21
Non-Acted Text and Keystrokes Database and Learning Methods to Recognize Emotions21
Reversible Data Hiding in Shared JPEG Images21
CLOUD-CODEC : A New Way of Storing Traffic Cameras Footage at Scale21
Counterfactual Scenario-relevant Knowledge-enriched Multi-modal Emotion Reasoning21
Source Information assisted UV-Space Transformation Network for Person Image Generation21
Zero-shot Scene Graph Generation via Triplet Calibration and Reduction21
TEVL: Trilinear Encoder for Video-language Representation Learning20
LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis20
DATRA-MIV: Decoder-Adaptive Tiling and Rate Allocation for MPEG Immersive Video20
Temporal and Semantic Correlation Network for Weakly-Supervised Temporal Action Localization20
Cross-modal Semantically Augmented Network for Image-text Matching20
Principal Component Approximation Network for Image Compression20
One-Bit Supervision for Image Classification: Problem, Solution, and Beyond20
Adversarial Sample Synthesis for Visual Question Answering20
Human Selective Matting19
Similarity Regulation and Calibration Alignment for Weakly Supervised Text-Based Person Re-Identification19
Temporal Dynamic Concept Modeling Network for Explainable Video Event Recognition19
Pedestrian-Aware Panoramic Video Stitching Based on a Structured Camera Array19
Toward Egocentric Compositional Action Anticipation with Adaptive Semantic Debiasing19
ATMNet: Adaptive Texture Migration Network for Guided Depth Super-Resolution19
An Efficient and Accurate GPU-based Deep Learning Model for Multimedia Recommendation19
Cyclic Self-attention for Point Cloud Recognition19
Deep Chroma Compression of Tone-Mapped Images19
Motion-Aware Self-Supervised RGBT Tracking with Multi-Modality Hierarchical Transformers19
Authentication of LINE Chat History Files by Information Hiding19
Melody Generation from Lyrics with Local Interpretability19
Visual Security Index Combining CNN and Filter for Perceptually Encrypted Light Field Images19
DTSD: A Dual Teacher–Student-Based Discrimination Model for Anomaly Detection18
Potential Features Fusion Network for Multimodal Fake News Detection18
Fully Unsupervised Person Re-Identification via Selective Contrastive Learning18
Spotting the Fakes: A Deep Dive into GAN-Generated Face Detection18
Multi-Grained Point Cloud Geometry Compression via Dual-Model Prediction with Extended Octree18
PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation18
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation18
Multiply Complementary Priors for Image Compressive Sensing Reconstruction in Impulsive Noise18
BiRe-ID: Binary Neural Network for Efficient Person Re-ID17
Attack-Defending Contrastive Learning for Volumetric Medical Image Zero-Watermarking17
Pansharpening Scheme Using Bi-dimensional Empirical Mode Decomposition and Neural Network17
A Comprehensive Study of Deep Learning-based Covert Communication17
Diversity-Representativeness Replay and Knowledge Alignment for Lifelong Vehicle Re-identification17
Text-Guided Synthesis of Masked Face Images17
Deep Modular Co-Attention Shifting Network for Multimodal Sentiment Analysis17
ReFID: Reciprocal Frequency-aware Generalizable Person Re-identification via Decomposition and Filtering17
Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation17
Cross-Modality Relation and Uncertainty Exploration for Text-Based Person Search17
Robust Unsupervised Gaze Calibration Using Conversation and Manipulation Attention Priors16
Multigranularity Feature Aggregation and Cross-level Boundary Modeling for Temporal Action Detection16
Generative Image Steganography Based on Guidance Feature Distribution16
DISA: Disentangled Dual-Branch Framework for Affordance-Aware Human Insertion16
Dynamic Transfer Exemplar based Facial Emotion Recognition Model Toward Online Video16
GLPose: Global-Local Representation Learning for Human Pose Estimation15
CLIP-GS: CLIP-Informed Gaussian Splatting for View-Consistent 3D Indoor Semantic Understanding15
Towards Integrating Image Encryption with Compression: A Survey15
Shot Boundary Detection Using Color Clustering and Attention Mechanism15
Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation15
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach15
Structure-aware Video Style Transfer with Map Art15
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition15
Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning15
Triplet Contrastive Representation Learning for Unsupervised Vehicle Re-identification15
Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation15
Maximizing Long-Term Task Completion Ratio of UAV-Enabled Wirelessly Powered MEC Systems15
Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval14
Offloading-based Power Efficient Mobile VTuber Live Streaming14
Action-aware Linguistic Skeleton Optimization Network for Non-autoregressive Video Captioning14
Cascaded Adaptive Graph Representation Learning for Image Copy-Move Forgery Detection14
Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection14
Robust and Secure Hashing Towards Pirated Neural Network Model Detection14
Self-supervised Calorie-aware Heterogeneous Graph Networks for Food Recommendation14
Multi-Modal Driven Pose-Controllable Talking Head Generation14
NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank Minimization14
SSAT: Active Authorization Control and User’s Fingerprint Tracking Framework for DNN IP Protection14
Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection14
Toward High-quality Face-Mask Occluded Restoration14
3D Facial Shape Similarity with Deep Perceptual Representations14
Learning the User’s Deeper Preferences for Multi-modal Recommendation Systems14
Arbitrary Virtual Try-on Network: Characteristics Preservation and Tradeoff between Body and Clothing14
Content-Aware Selective Encryption for H.265/HEVC Using Deep Hashing Network and Steganography14
Dual Dynamic Threshold Adjustment Strategy13
A Normalized Slicing-assigned Virtualization Method for 6G-based Wireless Communication Systems13
Generating Robust Adversarial Examples against Online Social Networks (OSNs)13
Semantic Completion and Filtration for Image–Text Retrieval13
Unsupervised Domain Adaptation by Causal Learning for Biometric Signal-based HCI13
Where Are They Going? Predicting Human Behaviors in Crowded Scenes13
Privacy-preserving Multi-source Cross-domain Recommendation Based on Knowledge Graph13
Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting13
Mimicking Individual Media Quality Perception with Neural Network based Artificial Observers13
Introduction to the Special Issue on Explainable Deep Learning for Medical Image Computing13
Transformer-Based Visual Grounding with Cross-Modality Interaction13
Progressive Transformer Machine for Natural Character Reenactment13
Online Correction of Camera Poses for the Surround-view System: A Sparse Direct Approach13
Robust Image Hashing via CP Decomposition and DCT for Copy Detection13
Multi-view Shape Generation for a 3D Human-like Body13
Learning Nighttime Semantic Segmentation the Hard Way13
Semantics and Non-fungible Tokens for Copyright Management on the Metaverse and Beyond13
Quality Assessment in the Era of Large Models: A Survey13
Quality Enhancement of Compressed 360-Degree Videos Using Viewport-based Deep Neural Networks13
ProposalVLAD with Proposal-Intra Exploring for Temporal Action Proposal Generation13
Self-supervised Multi-view Learning via Auto-encoding 3D Transformations13
Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective12
Robust Long-Term Tracking via Localizing Occluders12
Smart City Construction and Management by Digital Twins and BIM Big Data in COVID-19 Scenario12
Generating and Evaluating Data of Daily Activities with an Autonomous Agent in a Virtual Smart Home12
Autoregressive GAN for Semantic Unconditional Head Motion Generation12
GAN-Assisted Road Segmentation from Satellite Imagery12
Dynamic Weighted Gradient Reversal Network for Visible-infrared Person Re-identification12
Generation and Editing of Mandrill Faces: Application to Sex Editing and Assessment12
Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action Recognition12
Joint Structure-Texture Scan-Order for Point Cloud Attribute Compression Using Affine Transformation12
Language-guided Residual Graph Attention Network and Data Augmentation for Visual Grounding12
Skeleton-Aware Graph-Based Adversarial Networks for Human Pose Estimation from Sparse IMUs12
Noise-Resistance Learning via Multi-Granularity Consistency for Unsupervised Domain Adaptive Person Re-Identification12
Dual Scene Graph Convolutional Network for Motivation Prediction12
Boosting Few-shot Object Detection with Discriminative Representation and Class Margin12
Balanced and Accurate Pseudo-Labels for Semi-Supervised Image Classification12
Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors12
VRVul-Discovery: BiLSTM-based Vulnerability Discovery for Virtual Reality Devices in Metaverse12
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications12
GJFusion: A Channel-Level Correlation Construction Method for Multimodal Physiological Signal Fusion12
Hyperspectral Image Reconstruction Using Multi-scale Fusion Learning11
InteractNet: Social Interaction Recognition for Semantic-rich Videos11
Multiscale Feature Importance-based Bit Allocation for End-to-End Feature Coding for Machines11
FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification11
CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction Metric11
Residual-guided In-loop Filter Using Convolution Neural Network11
LFIZW-GRHFMR: Robust Zero-Watermarking with GRHFMR for Light Field Image11
FAST: Flexibly Controllable Arbitrary Style Transfer via Latent Diffusion models11
EVASR: Edge-Based Salience-Aware Super-Resolution for Enhanced Video Quality and Power Efficiency11
A Real-Time Medical Image Encryption Algorithm Leveraging a Novel Hypersensitive Chaotic Map11
Trans-Convo-Former Net for hierarchical prediction of household images11
Learning Semantic Representation on Visual Attribute Graph for Person Re-identification and Beyond11
Compressed Point Cloud Quality Index by Combining Global Appearance and Local Details11
Meetor: A Human-Centered Automatic Video Editing System for Meeting Recordings11
Language-guided Bias Generation Contrastive Strategy for Visual Question Answering11
A Convolutional Neural Network Model Using Weighted Loss Function to Detect Diabetic Retinopathy11
Efficient Privacy-Preserving Video Analytics via Share Transforming in Distributed Clouds11
ALOHA: Adapting Local Spatio-Temporal Context to Enhance the Audio-Visual Semantic Segmentation11
SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection11
Optimized Deep-Neural Network for Content-based Medical Image Retrieval in a Brownfield IoMT Network11
MLIC ++ : Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression11
Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling11
A Hierarchically Discriminative Loss with Group Regularization for Fine-Grained Image Classification10
iDAM: Iteratively Trained Deep In-loop Filter with Adaptive Model Selection10
Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept10
Complementary Feature Pyramid Network for Object Detection10
T2C: Text-guided 4D Cloth Generation10
Dual-Modality-Shared Learning and Label Refinement for Unsupervised Visible-Infrared Person ReID10
Multi-Scale and Multi-Layer Lattice Transformer for Underwater Image Enhancement10
When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization10
Boolean-based Two-in-One Secret Image Sharing by Adaptive Pixel Grouping10
Instance-level Adversarial Source-free Domain Adaptive Person Re-identification10
Multimodal Cascaded Framework with Multimodal Latent Loss Functions Robust to Missing Modalities10
A Review of Player Engagement Estimation in Video Games: Challenges and Opportunities10
PMAL: A Proxy Model Active Learning Approach for Vision Based Industrial Applications10
Beyond the Parts: Learning Coarse-to-Fine Adaptive Alignment Representation for Person Search10
Inner Knowledge-based Img2Doc Scheme for Visual Question Answering10
Deep Differential Lifelong Cross-modal Hashing for Stream Medical Data Retrieval10
Pose- and Attribute-consistent Person Image Synthesis9
AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval9
A Spatial Relationship Preserving Adversarial Network for 3D Reconstruction from a Single Depth View9
Attention-guided Multi-modality Interaction Network for RGB-D Salient Object Detection9
Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal Retrieval9
Multi-Scale Feature Attention Fusion for Image Splicing Forgery Detection9
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning9
Hierarchical and Progressive Image Matting9
Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge Computing9
Self-supervised Image-based 3D Model Retrieval9
A Novel Multi-Sample Generation Method for Adversarial Attacks9
Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition9
Pivot: Panoramic-Image-Based VR User Authentication against Side-Channel Attacks9
DPDFormer: A Coarse-to-Fine Model for Monocular Depth Estimation9
Variational Autoencoder with CCA for Audio–Visual Cross-modal Retrieval9
Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection9
Text-and-Image Learning Transformer for Cross-Modal Person Re-Identification9
Discard Significant Bits of Compressed Sensing: A Robust Image Coding for Resource-Limited Contexts9
Neural Image Compression with Regional Decoding9
Beyond Songs: Analyzing User Sentiment through Music Playlists and Multimodal Data9
Context-Based Novel Histogram Bin Stretching Algorithm for Automatic Contrast Enhancement9
0.052252769470215