Multimedia Systems

Papers
(The H4-Index of Multimedia Systems is 26. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)
ArticleCitations
A visual question answering model based on image captioning122
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks92
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy82
Pseudo-global strategy-based visual comfort assessment considering attention mechanism82
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network73
Real emotion seeker: recalibrating annotation for facial expression recognition70
Towards domain adaptation underwater image enhancement and restoration67
A comparative study of color quantization methods using various image quality assessment indices62
BENet: bi-directional enhanced network for image captioning56
Recent advancement in haze removal approaches48
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model48
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer47
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement41
Dual-branch spectral–spatial feature extraction network for multispectral image compression41
Face and voice cross-modal association with learning convex feature embedding40
ConASD: Contrastive Few Shot Learning for Detecting Autism Spectrum Disorder via Eye Tracking Scanpath39
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network39
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image38
360° video quality assessment based on saliency-guided viewport extraction34
GVA: guided visual attention approach for automatic image caption generation34
SFRA: spatial fusion regression augmentation network for facial landmark detection31
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors29
Model-based portrait video compression with spatial constraint and adaptive pose processing29
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization29
Improving text-image cross-modal retrieval with contrastive loss28
Design and realization of pulse-controlled multi-memristor Hopfield neural networks and their applications in information encryption27
Segmentation-aware image super-resolution with generative adversarial networks26
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach26
0.32704210281372