Multimedia Systems

Papers
(The H4-Index of Multimedia Systems is 26. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)
ArticleCitations
A visual question answering model based on image captioning122
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks92
Pseudo-global strategy-based visual comfort assessment considering attention mechanism82
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy82
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network73
Real emotion seeker: recalibrating annotation for facial expression recognition70
Towards domain adaptation underwater image enhancement and restoration67
A comparative study of color quantization methods using various image quality assessment indices62
BENet: bi-directional enhanced network for image captioning56
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model48
Recent advancement in haze removal approaches48
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer47
Dual-branch spectral–spatial feature extraction network for multispectral image compression41
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement41
Face and voice cross-modal association with learning convex feature embedding40
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network39
ConASD: Contrastive Few Shot Learning for Detecting Autism Spectrum Disorder via Eye Tracking Scanpath39
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image38
GVA: guided visual attention approach for automatic image caption generation34
360° video quality assessment based on saliency-guided viewport extraction34
SFRA: spatial fusion regression augmentation network for facial landmark detection31
Model-based portrait video compression with spatial constraint and adaptive pose processing29
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization29
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors29
Improving text-image cross-modal retrieval with contrastive loss28
Design and realization of pulse-controlled multi-memristor Hopfield neural networks and their applications in information encryption27
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach26
Segmentation-aware image super-resolution with generative adversarial networks26
0.086929082870483