OOIR: Observatory of International Research

Papers

(The median citation count of Multimedia Systems is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Pseudo-global strategy-based visual comfort assessment considering attention mechanism	171
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy	123
Face and voice cross-modal association with learning convex feature embedding	93
DiffRA: universal restorative adversarial attack based on diffusion model	84
SFFN-YOLO for small object detection in aerial images	74
Improving text-image cross-modal retrieval with contrastive loss	64
TreeSegNet: multi-scale query-based instance segmentation with frequency-aware and gated feature enhancement	63
GVA: guided visual attention approach for automatic image caption generation	55
Dual-branch spectral–spatial feature extraction network for multispectral image compression	54
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network	52
FedMAB: adaptive multimodal federated learning with multi-armed bandits	51
A visual question answering model based on image captioning	47
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement	45
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet	42
User authentication method based on keystroke dynamics and mouse dynamics using HDA	41
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks	40
The segmented UEC Food-100 dataset with benchmark experiment on food detection	38
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer	35
Model-based portrait video compression with spatial constraint and adaptive pose processing	32
JAMD-Net: image splicing forgery detection based on JPEG compression artifacts and multi-dilated channel refinement fusion	31
Segmentation-aware image super-resolution with generative adversarial networks	31
CHCoT-MSLU: a coupled hierarchical chain-of-thought prompt learning model for multi-intent spoken language understanding	30
Real emotion seeker: recalibrating annotation for facial expression recognition	30
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach	30
Towards domain adaptation underwater image enhancement and restoration	29

SFRA: spatial fusion regression augmentation network for facial landmark detection	27
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization	27
Fast latent-feature augmentation for cross-domain face forgery detection	26
360° video quality assessment based on saliency-guided viewport extraction	26
Atacr-net: adaptive temporal alignment and contrastive refinement network for skeleton-based action recognition	26
ConASD: Contrastive Few Shot Learning for Detecting Autism Spectrum Disorder via Eye Tracking Scanpath	26
Semi-supervised adversarial training via disentangled contrastive learning	25
Hierarchical feature multi-contrastive learning for skin cancer classification	25
Saliency guided deep unfolding network for compressive sensing	25
Sketch-guided neural style transfer	25
A comparative study of color quantization methods using various image quality assessment indices	24
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors	23
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network	23
A variational causal inference-based method for recognizing object state changes in videos	23
On-line monitoring of structural performance of scraper conveyor driven by digital twin	23
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image	23
Mamba-driven context-aware tracking with dual prompts	22
Design and realization of pulse-controlled multi-memristor Hopfield neural networks and their applications in information encryption	22
Dual convolutional neural network with attention for image blind denoising	22
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model	22
SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery	21
GCGV: a dual-branch hybrid network integrating graph attention, CNNs, and vision transformers for enhanced hyperspectral image classification	21
BENet: bi-directional enhanced network for image captioning	21
LEA-depth: a lightweight self-supervised monocular depth estimation with attention fusion and edge-aware distillation	21
Optimizing codebook training through control chart analysis	20
EDB-Diff: a EdgeDevice based diffusion network for brain tumor image segmentation	20
Game and reference: efficient policy making for epidemic prevention and control	19
Fast bilateral filter with spatial subsampling	19
Spatial interpolation of head-related transfer functions using a physics-informed autoencoder	19
RGB-Net: transformer-based lightweight low-light image enhancement network via RGB channel separation	19
SoftBinReduce: data reduction for color quantization through soft binning	19
A verifiable variable threshold visual image secret sharing scheme	19
Multi-view region proposal network predictive learning for tracking	18
Exploiting local detail in single image super-resolution via hypergraph convolution	18
Inter-class distance enhanced prototypical network for few-shot text classification	18
Big-LITTLE-Net: a dual-branch network for small UAV detection	17
DAFMixerSR: a lightweight fusion-enhanced adaptive perception network for image super-resolution	17
Enhanced target recognition and localization using binocular vision and infrared thermal imaging	17
Quantifying Factual Divergence in Generative Models: SHAP-LIME Based Hallucination Score for LLMs	17
CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining	17
GL-MambaNet: Mamba-based global and local feature fusion for image dehazing	17
RefinerHash: a new hashing-based re-ranking technique for image retrieval	17
An automatic music generation method based on RSCLN_Transformer network	16
VLM-driven fine-grained semantic regularization for low-light image enhancement	16
Badinterpreter: Backdoor attack on LLM-based interpretable recommendation	16
Incrementaldreamer: scene-level 3D generation with incremental optimization	16
Transferable diffusion transformer for low-light image enhancement	16
DMFTNet: dense multimodal fusion transfer network for free-space detection	15
Similarity-guided contrastive learning for deep multi-view clustering	15
A multi-label classification method combined with texture enhancement for deepfake face detection	15

Speech-driven talking face video generation	15
Weakly supervised anomaly detection with multi-level contextual modeling	15
Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping	15
Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies	15
TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction	14
CR-DM: A novel craniofacial reconstruction framework based on diffusion model	14
A survey of multimodal federated learning: background, applications, and perspectives	14
Vulnerability Positioner (VulP): enhancing code vulnerability localization with CodeBERT	14
CCM-Net: image splicing localization network based on context-aware and cross-domain multi-scale fusion	14
Design and evaluation of a serious game in virtual reality to increase empathy towards students with phonological dyslexia	14
Fgef-net: frequency-guided and enhanced fusion dehazing network for visibility enhancement in maritime traffic surveillance	14
PDSRN: a progressive distillation network for generalizable single image super-resolution	13
EDCM-EA: event prediction based on event development context mining considering event arguments	13
Cross-modality geometry-guided historical momentum learning for coupled noisy visible-infrared re-identification	13
Enhanced 3D reconstruction with all-neighbor-first philosophy and Ricci flow-based mesh smoothing approach	13
Computer-aided diagnosis for early detection and staging of human pancreatic tumors using an optimized 3D CNN on computed tomography	13
Object detection of mural images based on improved YOLOv8	13
Workpiece tracking based on improved SiamFC++ and virtual dataset	13
Semantic segmentation network for remote sensing images based on category-aware cross-fusion	13
Learning shared features from specific and ambiguous descriptions for text-based person search	12
CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence	12
A plug-and-play image enhancement model for end-to-end object detection in low-light condition	12
Skeleton-based human activity recognition with wifi CSI using a hybrid approach combining convolutional neural network and long short term memory	12
Graph contrastive learning for recommendation with generative data augmentation	12
3D human pose estimation method based on multi-constrained dilated convolutions	12
Joint $$\alpha {-}\beta $$-divergences reconstruction and non-convex sparse regularization for image clustering	12
Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network	12
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism	12
Occluded scene text detection via context-awareness from sketch-level image representations	12
LPR: learning point-level temporal action localization through re-training	12
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks	12
Rethinking RGB-D salient object detection	12
PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network	12
MCLSC-Fusion: a multi-scale cross-modality long-short connection fusion network for infrared and visible images	12
Multimodal-enhanced hierarchical attention network for video captioning	12
Enhancing long-tailed classification via multi-strategy weighted experts with hybrid distillation	11
Multi-domain feature enhanced adaptive fusion network for multi-modal fake news detection	11
A CNN-transformer hybrid network with selective fusion and dual attention for image super-resolution	11
Gicnet: global information capture network for visual place recognition	11
UAPT: an underwater acoustic target recognition method based on pre-trained Transformer	11
A comprehensive survey on human pose estimation approaches	11
AI-driven Braille character recognition using partitioned spatial modeling and sequential learning	11
LAM-YOLOv11 for UAV transmission line inspection: overcoming environmental challenges with enhanced detection efficiency	11
Multi-level fine-grained center calibration network for unsupervised person re-identification	11
Smartphone-based gait recognition using convolutional neural networks and dual-tree complex wavelet transform	11
Scd-yolo: a novel object detection method for efficient road crack detection	11
Depth alignment interaction network for camouflaged object detection	11
Overcomplete-to-sparse representation learning for few-shot class-incremental learning	11
HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation	11
Multi-object tracking in the low-light with two-stage association and denoising based on image feature enhancement	10
Bag of states: a non-sequential approach to video-based engagement measurement	10
Swiftavatar: real-time human reconstruction via semantic graph deformation and surface awareness	10
Polarity-aware attention network for image sentiment analysis	10
Tb-mmrd: transformer-based multi-modal election rumor detection with agreement-aware gating and semantic fusion	10
Facial action unit detection with emotion consistency: a cross-modal learning approach	10
Diff-mednet: differential convolution and median-enhanced attention multiscale fusion for infrared small target detection	10
3D model watermarking using surface integrals of generated random vector fields	10
Synthetic shadows: the interplay of forensic detection and anti-forensic techniques in GAN-generated images	10
A hybrid spatial and spectral mamba network for hyperspectral image super-resolution	10
You watch once more: a more effective CNN architecture for video spatio-temporal action localization	10
Pointlgfn: local–global fusion network for point cloud classification	10
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning	10
Detecting offensive language on instagram with a combined approach of the Gray Wolf algorithm and deep learning networks	10
Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clustering	10
SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering	10
CloudCap3D: enhancing 3D in-scene descriptions via point cloud integration and efficient text filtering	10
FedVC-ADDiM: a federated learning framework for diagnosis of alzheimer disease using deep learning	10
COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray	10
Deepfake detection of occluded images using a patch-based approach	10
DFGAnet: a dual-branch multimodal fusion network based on graph and attention for emotion recognition in conversation	10
MGSAN: multimodal graph self-attention network for skeleton-based action recognition	9
ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping	9
Remote sensing image cloud removal based on multi-scale spatial information perception	9
Lightweight super-resolution via multi-group window self-attention and residual blueprint separable convolution	9
CBLC-SOOD: contrastive background and label correction for semi-supervised oriented object detection	9
Dual-visual collaborative enhanced transformer for image captioning	9
Task-adaptive parameter optimization for medical image classification transfer learning	9
Compact twice fusion network for edge detection	9
ST-GRU: spatiotemporal gated recurrent unit for video prediction	9
Gender estimation based on deep learned and handcrafted features in an uncontrolled environment	9

Non-convex fractional-order TV model for image inpainting	9
Reducing blind spots in esophagogastroduodenoscopy examinations using a novel deep learning model	9
Dual-stream progressive neural network based on cross fusion in image manipulation localization	9
Unsupervised single-image dehazing via self-guided inverse-retinex GAN	9
MobileViNeXt: a lightweight fusion model for ship-radiated noise recognition	9
Same-clothes person re-identification with dual-stream network	9
Text-centered cross-sample fusion network for multimodal sentiment analysis	9
GloFP-MSF: monocular scene flow estimation with global feature perception	9
Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector	9
A robust federated aggregation algorithm for multimodal data in smart grid scenarios	9
Fine-grained behavior interaction-aware network for efficient multi-person motion forecasting	9
3D human pose estimation with multi-hypotheses gated transformer	9
Prior-based bi-encoder transformer for underwater image enhancement	8
Adversarial training in logit space against tiny perturbations	8
Scene text image super-resolution algorithm based on directional feature modeling	8
VCounselor: a psychological intervention chat agent based on a knowledge-enhanced large language model	8
EfficientFace: an efficient deep network with feature enhancement for accurate face detection	8
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video	8
GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation	8
DRL-based transmission control for QoE guaranteed transmission efficiency optimization in tile-based panoramic video streaming	8
Dual attention transformer with adaptive frequency enhancement for real-world Chinese–English scene text image super-resolution	8
Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction	8
Hierarchical segmentation for traditional cultural pattern based on iterative compression and clustering	8
Deep unfolding low-rank network for image denoising	8
Multi-granular dynamic interaction network for multimodal sarcasm detection	8
DS-Diff: a dual-stage network with degradation-aware and semantic-aware for adverse weather removal based on diffusion models	8
A Three-stage multimodal emotion recognition network based on text low-rank fusion	8
Local discriminative graph convolutional networks for text classification	8
Learning unified anchor graph based on affinity relationships with strong consensus for multi-view spectral clustering	8
Hfffap-net: unsupervised fundus image enhancement with high-frequency feature fusion and artifact processing	8
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments	8
Teaching authentic sign language through multiple representation learning	8
Hybrid embedding for multimodal few-frame action recognition	8
Indirect visual–semantic alignment for generalized zero-shot recognition	8
PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention	8
Lightweight dual-path octave generative adversarial networks for few-shot image generation	8
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution	8
Face attribute recognition via end-to-end weakly supervised regional location	8
Special issue on data-driven personalisation of television content	8
Student engagement detection in online environment using computer vision and multi-dimensional feature fusion	8
STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition	8
Multi-document localization method based on bottom-up architecture	8
Generating generalized zero-shot learning based on dual-path feature enhancement	8
Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN	8
Multimodal large language model enhancement network for multimodal sentiment analysis	8
Selecting generated synthetic features using clustering algorithm for generalized zero-shot learning	8
KECAN: knowledge-enhanced cross-modal alignment network for ophthalmic report generation	8
Blind super-resolution based on matrix-variable optimization for video images	8
Propagating prior information with transformer for robust visual object tracking	7
TIPDF-DWSF: a task-oriented two-stage optimization framework for diffusion model LoRA fine-tuning	7
LET-Net: locally enhanced transformer network for medical image segmentation	7
Fine-tuning CLIP for difference-guided composed image retrieval	7
Link prediction in social networks using hyper-motif representation on hypergraph	7
EA-EDNet: encapsulated attention encoder-decoder network for 3D reconstruction in low-light-level environment	7
Recognition of miner action and violation behavior based on the ANODE-GCN model	7
Hierarchical segmentation-guided diffusion framework for high-fidelity sonar image generation	7
Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning	7
Layer-wise enhanced transformer with multi-modal fusion for image caption	7
An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition	7
A prompt-based dual-layer cross-modal distillation learning method for aspect-based sentiment analysis	7
X2Fashion: temporally consistent fashion video generation guided by image, pose and text	7
USGA: unified intra- and cross-scale features with global–local aggregation for long-term tracking	7
TSGFormer: temporal-aware network and spatial encoding GCN for three-dimensional human pose estimation	7
YOLO-ERF: lightweight object detector for UAV aerial images	7
CAPTCHA farm detection and user authentication via mouse-trajectory similarity measurement	7
Advanced techniques in digital media processing for special effects enhancement in film and television post-production	7
Estimating visibility via differential regression network	7
Opfusion: a deep blind image super resolution network using generative diffusion models and neural operator learning	7
Composite makeup transfer model based on generative adversarial networks	7
Personalized time-sync comment generation based on a multimodal transformer	7
An efficient federated learning method based on enhanced classification-GAN for medical image classification	7
Context-aware feature complementary screening network for mass segmentation in whole mammograms	7
CAFIN: cross-attention based face image repair network	7
Fast-colorfool: faster and more transferable semantic adversarial attack with complementary colors and cumulative perturbation	7
Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition	7
WFIL-NET: image inpainting based on wavelet downsampling and frequency integrated learning module	7
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection	7
Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS	7
Map modeling for full body gesture using flex sensor and machine learning algorithms	7
Role of deep learning models and analytics in industrial multimedia environment	7
Rescue decision via Earthquake Disaster Knowledge Graph reasoning	7
A sub-grouping-based resource allocation method for layered video’s multicast broadcast service (MBS) over the 5G cellular network	6
Learning effective embedding for automated COVID-19 prediction from chest X-ray images	6
Personalized music recommendation algorithm based on machine learning	6
A self-supervised enhancement method for real world low-light images using Retinex and camera response function	6
A multi-scale feature fusion spatial–channel attention model for background subtraction	6
Food nutrition estimation with RGB-D fusion module and bidirectional feature pyramid network	6
DATaR: Depth Augmented Target Redetection using Kernelized Correlation Filter	6
Prometheus: an efficient federated collaborative learning framework for coevolution of edge-cloud heterogeneous models	6
Dynamical semantic enhancement network for continuous sign language recognition	6
CLDE-Net: crowd localization and density estimation based on CNN and transformer network	6
A multi-scale no-reference video quality assessment method based on transformer	6
Adp-clf: adaptive dual-perception contrastive learning for gastrointestinal endoscopic image classification	6
Dual-focus: person search from Coarse-Grained Focus to Fine-Grained Focus	6
Prior tissue knowledge-driven contrastive learning for brain CT report generation	6