Multimedia Systems

Papers
(The median citation count of Multimedia Systems is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks133
Pseudo-global strategy-based visual comfort assessment considering attention mechanism95
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy92
DiffRA: universal restorative adversarial attack based on diffusion model86
On-line monitoring of structural performance of scraper conveyor driven by digital twin78
SFFN-YOLO for small object detection in aerial images78
CHCoT-MSLU: a coupled hierarchical chain-of-thought prompt learning model for multi-intent spoken language understanding71
The segmented UEC Food-100 dataset with benchmark experiment on food detection70
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network62
Dual-branch spectral–spatial feature extraction network for multispectral image compression51
Face and voice cross-modal association with learning convex feature embedding47
ConASD: Contrastive Few Shot Learning for Detecting Autism Spectrum Disorder via Eye Tracking Scanpath46
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network45
SFRA: spatial fusion regression augmentation network for facial landmark detection41
User authentication method based on keystroke dynamics and mouse dynamics using HDA41
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization41
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image41
Model-based portrait video compression with spatial constraint and adaptive pose processing39
Improving text-image cross-modal retrieval with contrastive loss35
Real emotion seeker: recalibrating annotation for facial expression recognition35
Correction: STASiamRPN: visual tracking based on spatiotemporal and attention31
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors30
A visual question answering model based on image captioning30
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model30
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach30
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet30
BENet: bi-directional enhanced network for image captioning29
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement29
SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery28
A comparative study of color quantization methods using various image quality assessment indices27
360° video quality assessment based on saliency-guided viewport extraction27
Design and realization of pulse-controlled multi-memristor Hopfield neural networks and their applications in information encryption24
Segmentation-aware image super-resolution with generative adversarial networks24
GVA: guided visual attention approach for automatic image caption generation23
Dual convolutional neural network with attention for image blind denoising23
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer23
Optimizing codebook training through control chart analysis22
EDB-Diff: a EdgeDevice based diffusion network for brain tumor image segmentation22
Towards domain adaptation underwater image enhancement and restoration22
SoftBinReduce: data reduction for color quantization through soft binning21
RGB-Net: transformer-based lightweight low-light image enhancement network via RGB channel separation21
Spatial interpolation of head-related transfer functions using a physics-informed autoencoder21
Weakly supervised anomaly detection with multi-level contextual modeling20
Multi-view region proposal network predictive learning for tracking19
CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining19
Overcoming the practical restrictions in H.266/VVC-based video communication systems by a PI bit rate controller19
Efficient and self-adaptive rationale knowledge base for visual commonsense reasoning18
Enhanced target recognition and localization using binocular vision and infrared thermal imaging18
Big-LITTLE-Net: a dual-branch network for small UAV detection18
An automatic music generation method based on RSCLN_Transformer network18
A verifiable variable threshold visual image secret sharing scheme18
Design and evaluation of a serious game in virtual reality to increase empathy towards students with phonological dyslexia18
DMFTNet: dense multimodal fusion transfer network for free-space detection17
Game and reference: efficient policy making for epidemic prevention and control17
A survey of multimodal federated learning: background, applications, and perspectives17
Inter-class distance enhanced prototypical network for few-shot text classification17
Fast bilateral filter with spatial subsampling17
A deep learning-based framework for detecting COVID-19 patients using chest X-rays17
Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies17
Exploiting local detail in single image super-resolution via hypergraph convolution17
TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction17
RefinerHash: a new hashing-based re-ranking technique for image retrieval17
Speech-driven talking face video generation16
Similarity-guided contrastive learning for deep multi-view clustering16
GL-MambaNet: Mamba-based global and local feature fusion for image dehazing16
DAFMixerSR: a lightweight fusion-enhanced adaptive perception network for image super-resolution16
Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping16
CCM-Net: image splicing localization network based on context-aware and cross-domain multi-scale fusion15
EDCM-EA: event prediction based on event development context mining considering event arguments15
Double-scale similarity with rich features for cross-modal retrieval15
CR-DM: A novel craniofacial reconstruction framework based on diffusion model15
A plug-and-play image enhancement model for end-to-end object detection in low-light condition15
A multi-label classification method combined with texture enhancement for deepfake face detection15
Workpiece tracking based on improved SiamFC++ and virtual dataset15
Enhanced 3D reconstruction with all-neighbor-first philosophy and Ricci flow-based mesh smoothing approach15
Skeleton-based human activity recognition with wifi CSI using a hybrid approach combining convolutional neural network and long short term memory14
3D human pose estimation method based on multi-constrained dilated convolutions14
RMVAE: one-class classification via divergence regularization and maximization mutual information14
Smartphone-based gait recognition using convolutional neural networks and dual-tree complex wavelet transform14
Cross-modality geometry-guided historical momentum learning for coupled noisy visible-infrared re-identification14
Multimodal-enhanced hierarchical attention network for video captioning14
Unsupervised cross-database micro-expression recognition based on distribution adaptation14
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks13
Learning shared features from specific and ambiguous descriptions for text-based person search13
Computer-aided diagnosis for early detection and staging of human pancreatic tumors using an optimized 3D CNN on computed tomography13
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism13
A CNN-transformer hybrid network with selective fusion and dual attention for image super-resolution13
Occluded scene text detection via context-awareness from sketch-level image representations13
Wireless multipath video transmission: when IoT video applications meet networking—a survey13
Object detection of mural images based on improved YOLOv813
LPR: learning point-level temporal action localization through re-training13
LAM-YOLOv11 for UAV transmission line inspection: overcoming environmental challenges with enhanced detection efficiency13
MCLSC-Fusion: a multi-scale cross-modality long-short connection fusion network for infrared and visible images13
PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network13
PDSRN: a progressive distillation network for generalizable single image super-resolution13
UAPT: an underwater acoustic target recognition method based on pre-trained Transformer12
Graph contrastive learning for recommendation with generative data augmentation12
A comprehensive survey on human pose estimation approaches12
Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network12
Enhancing long-tailed classification via multi-strategy weighted experts with hybrid distillation12
Depth alignment interaction network for camouflaged object detection12
CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence12
Scd-yolo: a novel object detection method for efficient road crack detection12
Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clustering12
Facial action unit detection with emotion consistency: a cross-modal learning approach11
Overcomplete-to-sparse representation learning for few-shot class-incremental learning11
HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation11
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning11
Deepfake detection of occluded images using a patch-based approach11
Synthetic shadows: the interplay of forensic detection and anti-forensic techniques in GAN-generated images11
Remote sensing image cloud removal based on multi-scale spatial information perception11
Non-convex fractional-order TV model for image inpainting11
Developing novel video coding model using modified dual-tree wavelet-based multi-resolution technique11
Gicnet: global information capture network for visual place recognition11
Detecting offensive language on instagram with a combined approach of the Gray Wolf algorithm and deep learning networks11
Asymmetric exponential loss function for crack segmentation11
SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering11
COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray11
HandO: a hybrid 3D hand–object reconstruction model for unknown objects11
Tb-mmrd: transformer-based multi-modal election rumor detection with agreement-aware gating and semantic fusion11
3D model watermarking using surface integrals of generated random vector fields11
Polarity-aware attention network for image sentiment analysis10
Pointlgfn: local–global fusion network for point cloud classification10
Lightweight super-resolution via multi-group window self-attention and residual blueprint separable convolution10
Same-clothes person re-identification with dual-stream network10
VCounselor: a psychological intervention chat agent based on a knowledge-enhanced large language model10
Gender estimation based on deep learned and handcrafted features in an uncontrolled environment10
Text-centered cross-sample fusion network for multimodal sentiment analysis10
Diff-mednet: differential convolution and median-enhanced attention multiscale fusion for infrared small target detection10
You watch once more: a more effective CNN architecture for video spatio-temporal action localization10
Bag of states: a non-sequential approach to video-based engagement measurement10
$$\hbox {DA}^2$$Net: a dual attention-aware network for robust crowd counting10
Learning unified anchor graph based on affinity relationships with strong consensus for multi-view spectral clustering10
Multi-level fine-grained center calibration network for unsupervised person re-identification10
DFGAnet: a dual-branch multimodal fusion network based on graph and attention for emotion recognition in conversation10
Multi-object tracking in the low-light with two-stage association and denoising based on image feature enhancement10
ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping10
STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition10
Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector10
MGSAN: multimodal graph self-attention network for skeleton-based action recognition9
ST-GRU: spatiotemporal gated recurrent unit for video prediction9
User quality of experience estimation using social network analysis9
Reducing blind spots in esophagogastroduodenoscopy examinations using a novel deep learning model9
3D human pose estimation with multi-hypotheses gated transformer9
CBLC-SOOD: contrastive background and label correction for semi-supervised oriented object detection9
EfficientFace: an efficient deep network with feature enhancement for accurate face detection9
Dual-stream progressive neural network based on cross fusion in image manipulation localization9
A Three-stage multimodal emotion recognition network based on text low-rank fusion9
Local discriminative graph convolutional networks for text classification9
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video9
Unsupervised single-image dehazing via self-guided inverse-retinex GAN9
Dual-visual collaborative enhanced transformer for image captioning9
GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation9
Hfffap-net: unsupervised fundus image enhancement with high-frequency feature fusion and artifact processing9
Face attribute recognition via end-to-end weakly supervised regional location9
Compact twice fusion network for edge detection9
GloFP-MSF: monocular scene flow estimation with global feature perception9
Student engagement detection in online environment using computer vision and multi-dimensional feature fusion9
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments8
Adversarial training in logit space against tiny perturbations8
Special issue on data-driven personalisation of television content8
Fast-colorfool: faster and more transferable semantic adversarial attack with complementary colors and cumulative perturbation8
Hierarchical segmentation for traditional cultural pattern based on iterative compression and clustering8
Prior-based bi-encoder transformer for underwater image enhancement8
Teaching authentic sign language through multiple representation learning8
Generating generalized zero-shot learning based on dual-path feature enhancement8
DS-Diff: a dual-stage network with degradation-aware and semantic-aware for adverse weather removal based on diffusion models8
EA-EDNet: encapsulated attention encoder-decoder network for 3D reconstruction in low-light-level environment8
Multi-granular dynamic interaction network for multimodal sarcasm detection8
Multimodal large language model enhancement network for multimodal sentiment analysis8
Hybrid embedding for multimodal few-frame action recognition8
Indirect visual–semantic alignment for generalized zero-shot recognition8
Estimating visibility via differential regression network8
Recognition of miner action and violation behavior based on the ANODE-GCN model8
Lightweight dual-path octave generative adversarial networks for few-shot image generation8
PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention8
Selecting generated synthetic features using clustering algorithm for generalized zero-shot learning8
Mpv-pcqa: multimodal no-reference point cloud quality assessment via point cloud and captured dynamic video7
IOPCNet: inner and outer point classification based low overlap rate local-to-global point cloud registration7
Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning7
CAFIN: cross-attention based face image repair network7
KECAN: knowledge-enhanced cross-modal alignment network for ophthalmic report generation7
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution7
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection7
CAPTCHA farm detection and user authentication via mouse-trajectory similarity measurement7
Link prediction in social networks using hyper-motif representation on hypergraph7
Role of deep learning models and analytics in industrial multimedia environment7
An efficient federated learning method based on enhanced classification-GAN for medical image classification7
Msfusenet: a multi-stage information fusion network for multi-modal skin lesion diagnosis7
TSGFormer: temporal-aware network and spatial encoding GCN for three-dimensional human pose estimation7
Map modeling for full body gesture using flex sensor and machine learning algorithms7
Propagating prior information with transformer for robust visual object tracking7
Fine-tuning CLIP for difference-guided composed image retrieval7
Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN7
Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition7
Opfusion: a deep blind image super resolution network using generative diffusion models and neural operator learning7
Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction7
DRL-based transmission control for QoE guaranteed transmission efficiency optimization in tile-based panoramic video streaming7
Exploring multi-dimensional interests for session-based recommendation7
WFIL-NET: image inpainting based on wavelet downsampling and frequency integrated learning module7
Composite makeup transfer model based on generative adversarial networks7
Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes7
Channel modulus normalization for CNN image classification7
Adaptafood: an intelligent system to adapt recipes to specialised diets and healthy lifestyles7
Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS7
YOLO-ERF: lightweight object detector for UAV aerial images7
Deep unfolding low-rank network for image denoising7
Advanced techniques in digital media processing for special effects enhancement in film and television post-production7
Dual attention transformer with adaptive frequency enhancement for real-world Chinese–English scene text image super-resolution7
Music genre classification based on auditory image, spectral and acoustic features7
LCFormer: linear complexity transformer for efficient image super-resolution7
Prometheus: an efficient federated collaborative learning framework for coevolution of edge-cloud heterogeneous models6
Irregular feature enhancer for low-dose CT denoising6
Locally controllable network based on visual–linguistic relation alignment for text-to-image generation6
Context-aware feature complementary screening network for mass segmentation in whole mammograms6
From coarse to fine: a two-stage common semantic space construction for unpaired cross modal retrieval6
Personalized time-sync comment generation based on a multimodal transformer6
Exemplar-guided low-light image enhancement6
Image and audio caps: automated captioning of background sounds and images using deep learning6
TrafficTrack: rethinking the motion and appearance cue for multi-vehicle tracking in traffic monitoring6
A self-supervised enhancement method for real world low-light images using Retinex and camera response function6
Multi-branch feature fusion and refinement network for salient object detection6
Topic-guided multi-domain fake news detection6
DATaR: Depth Augmented Target Redetection using Kernelized Correlation Filter6
Dual-focus: person search from Coarse-Grained Focus to Fine-Grained Focus6
Personalized music recommendation algorithm based on machine learning6
A cross-view geo-localization method guided by relation-aware global attention6
A multi-scale feature fusion spatial–channel attention model for background subtraction6
Spatial attention-guided deformable fusion network for salient object detection6
Segmentation and recognition of filed sweet pepper based on improved self-attention convolutional neural networks6
TIPDF-DWSF: a task-oriented two-stage optimization framework for diffusion model LoRA fine-tuning6
A prompt-based dual-layer cross-modal distillation learning method for aspect-based sentiment analysis6
Prior tissue knowledge-driven contrastive learning for brain CT report generation6
CLDE-Net: crowd localization and density estimation based on CNN and transformer network6
Dynamical semantic enhancement network for continuous sign language recognition6
Adp-clf: adaptive dual-perception contrastive learning for gastrointestinal endoscopic image classification6
Exploring granularity-associated invariance features for text-to-image person re-identification6
A multi-scale no-reference video quality assessment method based on transformer6
A two-stage forgery detection and localization framework based on feature classification and similarity metric6
An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition6
Learning effective embedding for automated COVID-19 prediction from chest X-ray images6
Layer-wise enhanced transformer with multi-modal fusion for image caption6
Deep learning in multimedia healthcare applications: a review6
Multiscale geometric window transformer for orthodontic teeth point cloud registration6
SR-DAYOLOv8: cross-domain adaptive object detection based on super-resolution domain classifier6
LET-Net: locally enhanced transformer network for medical image segmentation6
Mmy-net: a multimodal network exploiting image and patient metadata for simultaneous segmentation and diagnosis6
Sat-DehazeGAN: an efficient dehazing model in water-sky background for river-sea transport6
Collaborative point cloud geometry compression for both human vision and machine vision6
A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation6
0.10402679443359