OOIR: Observatory of International Research

Papers

(The TQCC of Multimedia Systems is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-07-01 to 2025-07-01.)

Article	Citations
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network	96
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks	92
Pseudo-global strategy-based visual comfort assessment considering attention mechanism	82
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network	77
A visual question answering model based on image captioning	62
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy	60
Correction: STASiamRPN: visual tracking based on spatiotemporal and attention	56
Dual convolutional neural network with attention for image blind denoising	55
The segmented UEC Food-100 dataset with benchmark experiment on food detection	52
Improving text-image cross-modal retrieval with contrastive loss	43
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach	43
User authentication method based on keystroke dynamics and mouse dynamics using HDA	42
360° video quality assessment based on saliency-guided viewport extraction	42
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image	40
Model-based portrait video compression with spatial constraint and adaptive pose processing	40
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement	39
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model	37
Towards domain adaptation underwater image enhancement and restoration	37
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet	37
Recent advancement in haze removal approaches	36
SFRA: spatial fusion regression augmentation network for facial landmark detection	32
Segmentation-aware image super-resolution with generative adversarial networks	31
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors	29
Dual-branch spectral–spatial feature extraction network for multispectral image compression	27
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization	27

Multi-view Isolated sign language recognition based on cross-view and multi-level transformer	25
Point cloud inpainting with normal-based feature matching	24
BENet: bi-directional enhanced network for image captioning	23
Real emotion seeker: recalibrating annotation for facial expression recognition	23
A comparative study of color quantization methods using various image quality assessment indices	22
GVA: guided visual attention approach for automatic image caption generation	22
SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery	22
Game and reference: efficient policy making for epidemic prevention and control	22
A deep learning-based framework for detecting COVID-19 patients using chest X-rays	21
EDB-Diff: a EdgeDevice based diffusion network for brain tumor image segmentation	21
Weakly supervised anomaly detection with multi-level contextual modeling	21
SoftBinReduce: data reduction for color quantization through soft binning	20
An automatic music generation method based on RSCLN_Transformer network	20
Inter-class distance enhanced prototypical network for few-shot text classification	20
RGB-Net: transformer-based lightweight low-light image enhancement network via RGB channel separation	20
Spatial interpolation of head-related transfer functions using a physics-informed autoencoder	19
TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction	19
CR-DM: A novel craniofacial reconstruction framework based on diffusion model	18
Optimizing codebook training through control chart analysis	18
Multi-view region proposal network predictive learning for tracking	17
Overcoming the practical restrictions in H.266/VVC-based video communication systems by a PI bit rate controller	17
Improved SSD using deep multi-scale attention spatial–temporal features for action recognition	17
DMFTNet: dense multimodal fusion transfer network for free-space detection	17
Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies	16
CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining	16
RefinerHash: a new hashing-based re-ranking technique for image retrieval	15
Exploiting local detail in single image super-resolution via hypergraph convolution	15
A verifiable variable threshold visual image secret sharing scheme	15
Efficient and self-adaptive rationale knowledge base for visual commonsense reasoning	15
Smartphone-based gait recognition using convolutional neural networks and dual-tree complex wavelet transform	14
Fast bilateral filter with spatial subsampling	14
Enhanced 3D reconstruction with all-neighbor-first philosophy and Ricci flow-based mesh smoothing approach	14
Double-scale similarity with rich features for cross-modal retrieval	14
Workpiece tracking based on improved SiamFC++ and virtual dataset	14
A survey of multimodal federated learning: background, applications, and perspectives	14
Scd-yolo: a novel object detection method for efficient road crack detection	13
Occluded scene text detection via context-awareness from sketch-level image representations	13
A plug-and-play image enhancement model for end-to-end object detection in low-light condition	13
Unsupervised cross-database micro-expression recognition based on distribution adaptation	13
PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network	13
CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence	13
EDCM-EA: event prediction based on event development context mining considering event arguments	13
Multimodal-enhanced hierarchical attention network for video captioning	12
Skeleton-based human activity recognition with wifi CSI using a hybrid approach combining convolutional neural network and long short term memory	12
RMVAE: one-class classification via divergence regularization and maximization mutual information	12
Enhancing long-tailed classification via multi-strategy weighted experts with hybrid distillation	12
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism	12
A CNN-based scheme for COVID-19 detection with emergency services provisions using an optimal path planning	12
LPR: learning point-level temporal action localization through re-training	12
Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network	12

Object detection of mural images based on improved YOLOv8	11
Cross-modality geometry-guided historical momentum learning for coupled noisy visible-infrared re-identification	11
A CNN-transformer hybrid network with selective fusion and dual attention for image super-resolution	11
Learning shared features from specific and ambiguous descriptions for text-based person search	11
Computer-aided diagnosis for early detection and staging of human pancreatic tumors using an optimized 3D CNN on computed tomography	11
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks	11
Graph contrastive learning for recommendation with generative data augmentation	11
UAPT: an underwater acoustic target recognition method based on pre-trained Transformer	11
Non-convex fractional-order TV model for image inpainting	10
3D human pose estimation method based on multi-constrained dilated convolutions	10
LAM-YOLOv11 for UAV transmission line inspection: overcoming environmental challenges with enhanced detection efficiency	10
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning	10
Overcomplete-to-sparse representation learning for few-shot class-incremental learning	10
HandO: a hybrid 3D hand–object reconstruction model for unknown objects	10
Wireless multipath video transmission: when IoT video applications meet networking—a survey	10
Depth alignment interaction network for camouflaged object detection	10
Remote sensing image cloud removal based on multi-scale spatial information perception	10
Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clustering	10
A comprehensive survey on human pose estimation approaches	10
Developing novel video coding model using modified dual-tree wavelet-based multi-resolution technique	10
HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation	10
3D model watermarking using surface integrals of generated random vector fields	9
You watch once more: a more effective CNN architecture for video spatio-temporal action localization	9
SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering	9
Practical 3D human skeleton tracking based on multi-view and multi-Kinect fusion	9
VCounselor: a psychological intervention chat agent based on a knowledge-enhanced large language model	9
EfficientFace: an efficient deep network with feature enhancement for accurate face detection	9
Bag of states: a non-sequential approach to video-based engagement measurement	9
Asymmetric exponential loss function for crack segmentation	9
Deepfake detection of occluded images using a patch-based approach	9
Text-centered cross-sample fusion network for multimodal sentiment analysis	9
GloFP-MSF: monocular scene flow estimation with global feature perception	9
Image quality measurement-based comparative analysis of illumination compensation methods for face image normalization	9
Polarity-aware attention network for image sentiment analysis	9
Facial action unit detection with emotion consistency: a cross-modal learning approach	9
Multi-level fine-grained center calibration network for unsupervised person re-identification	9
COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray	9
Gicnet: global information capture network for visual place recognition	9
Lightweight super-resolution via multi-group window self-attention and residual blueprint separable convolution	9
Gender estimation based on deep learned and handcrafted features in an uncontrolled environment	9
Local discriminative graph convolutional networks for text classification	8
Reducing blind spots in esophagogastroduodenoscopy examinations using a novel deep learning model	8
ST-GRU: spatiotemporal gated recurrent unit for video prediction	8
3D human pose estimation with multi-hypotheses gated transformer	8
Face attribute recognition via end-to-end weakly supervised regional location	8
ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping	8
Student engagement detection in online environment using computer vision and multi-dimensional feature fusion	8
STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition	8
Compact twice fusion network for edge detection	8
DwiMark: a multiscale robust deep watermarking framework for diffusion-weighted imaging images	8
Dual-visual collaborative enhanced transformer for image captioning	8
Same-clothes person re-identification with dual-stream network	8
Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector	8
MGSAN: multimodal graph self-attention network for skeleton-based action recognition	8
Learning unified anchor graph based on affinity relationships with strong consensus for multi-view spectral clustering	8
$$\hbox {DA}^2$$Net: a dual attention-aware network for robust crowd counting	8
User quality of experience estimation using social network analysis	8
Image lossless encoding and encryption method of EBCOT Tier1 based on 4D hyperchaos	8
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video	8
Unsupervised single-image dehazing via self-guided inverse-retinex GAN	8
GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation	8
A Three-stage multimodal emotion recognition network based on text low-rank fusion	8
Estimating visibility via differential regression network	7
Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN	7
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments	7
Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS	7
Multi-granular dynamic interaction network for multimodal sarcasm detection	7
Adversarial training in logit space against tiny perturbations	7
WFIL-NET: image inpainting based on wavelet downsampling and frequency integrated learning module	7
Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction	7
Generating generalized zero-shot learning based on dual-path feature enhancement	7
An improved algorithm of video quality assessment by danmaku analysis	7
Prior-based bi-encoder transformer for underwater image enhancement	7
Recognition of miner action and violation behavior based on the ANODE-GCN model	7
Special issue on data-driven personalisation of television content	7
DS-Diff: a dual-stage network with degradation-aware and semantic-aware for adverse weather removal based on diffusion models	7
DRL-based transmission control for QoE guaranteed transmission efficiency optimization in tile-based panoramic video streaming	7
Hierarchical segmentation for traditional cultural pattern based on iterative compression and clustering	7
Indirect visual–semantic alignment for generalized zero-shot recognition	7
Lightweight dual-path octave generative adversarial networks for few-shot image generation	7

HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection	7
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution	6
YOLO-ERF: lightweight object detector for UAV aerial images	6
Composite makeup transfer model based on generative adversarial networks	6
Role of deep learning models and analytics in industrial multimedia environment	6
Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition	6
Irregular feature enhancer for low-dose CT denoising	6
Music genre classification based on auditory image, spectral and acoustic features	6
Propagating prior information with transformer for robust visual object tracking	6
Hybrid embedding for multimodal few-frame action recognition	6
Unsupervised adversarial image retrieval	6
EA-EDNet: encapsulated attention encoder-decoder network for 3D reconstruction in low-light-level environment	6
Multiscale geometric window transformer for orthodontic teeth point cloud registration	6
Spatial attention-guided deformable fusion network for salient object detection	6
Map modeling for full body gesture using flex sensor and machine learning algorithms	6
An efficient federated learning method based on enhanced classification-GAN for medical image classification	6
Link prediction in social networks using hyper-motif representation on hypergraph	6
Channel modulus normalization for CNN image classification	6
Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning	6
A cross-view geo-localization method guided by relation-aware global attention	6
Personalized time-sync comment generation based on a multimodal transformer	6
Exploring multi-dimensional interests for session-based recommendation	6
An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition	6
A novel SPLIT-SIM approach for efficient image retrieval	6
TrafficTrack: rethinking the motion and appearance cue for multi-vehicle tracking in traffic monitoring	6
Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning	6
PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention	6
CAFIN: cross-attention based face image repair network	6
Fast-colorfool: faster and more transferable semantic adversarial attack with complementary colors and cumulative perturbation	6
A multi-scale channel attention network with federated learning for magnetic resonance image super-resolution	6
Exploring granularity-associated invariance features for text-to-image person re-identification	5
Food nutrition estimation with RGB-D fusion module and bidirectional feature pyramid network	5
DATaR: Depth Augmented Target Redetection using Kernelized Correlation Filter	5
Dual-focus: person search from Coarse-Grained Focus to Fine-Grained Focus	5
Prior tissue knowledge-driven contrastive learning for brain CT report generation	5
LCFormer: linear complexity transformer for efficient image super-resolution	5
Locally controllable network based on visual–linguistic relation alignment for text-to-image generation	5
Rescue decision via Earthquake Disaster Knowledge Graph reasoning	5
ITrans: generative image inpainting with transformers	5
Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes	5
Editorial note for few-shot learning for intelligent multimedia systems	5
Full reference image quality assessment based on dual-space multi-feature fusion	5
A two-stage forgery detection and localization framework based on feature classification and similarity metric	5
Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection	5
A multi-scale feature fusion spatial–channel attention model for background subtraction	5
Prometheus: an efficient federated collaborative learning framework for coevolution of edge-cloud heterogeneous models	5
A weakly supervised pavement crack segmentation based on adversarial learning and transformers	5
A multi-scale no-reference video quality assessment method based on transformer	5
Image and audio caps: automated captioning of background sounds and images using deep learning	5
Layer-wise enhanced transformer with multi-modal fusion for image caption	5
IOPCNet: inner and outer point classification based low overlap rate local-to-global point cloud registration	5
LET-Net: locally enhanced transformer network for medical image segmentation	5
Msfusenet: a multi-stage information fusion network for multi-modal skin lesion diagnosis	5
Sat-DehazeGAN: an efficient dehazing model in water-sky background for river-sea transport	5
From coarse to fine: a two-stage common semantic space construction for unpaired cross modal retrieval	5
Mixed multi-scale residual attention networks for single image super-resolution reconstruction	5
Topic-guided multi-domain fake news detection	5
Hierarchical bi-directional conceptual interaction for text-video retrieval	5
A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation	5
CLDE-Net: crowd localization and density estimation based on CNN and transformer network	5
SR-DAYOLOv8: cross-domain adaptive object detection based on super-resolution domain classifier	5
TSGFormer: temporal-aware network and spatial encoding GCN for three-dimensional human pose estimation	5
Adaptafood: an intelligent system to adapt recipes to specialised diets and healthy lifestyles	5
PS-YOLO: a small object detector based on efficient convolution and multi-scale feature fusion	5
Exemplar-guided low-light image enhancement	5
Coordinate-aligned multi-camera collaboration for active multi-object tracking	4
A distribution-aware 2D multi-person pose estimation method with attention mechanisms	4
BotICC: enhancing social bot detection through implicit connection computation	4
PillarVTP: vehicle trajectory prediction method based on local point cloud aggregation and receptive field expansion	4
Semantic embedding: scene image classification using scene-specific objects	4
WPELip: enhance lip reading with word-prior information	4
Numerical computation based few-shot learning for intelligent sea surface temperature prediction	4
TFA-CNN: an efficient method for dealing with crowding and noise problems in crowd counting	4
Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection	4
Robust Grassmann manifold convex hull collaborative representation learning and its kernel extension for image set analysis	4
A comprehensive survey of image and video forgery techniques: variants, challenges, and future directions	4
Learning multi-scale features automatically from food and ingredients	4
Design and implementation of a real-time face recognition system based on artificial intelligence techniques	4
UMPA: Unified multi-modal prompt with adapter for vision-language models	4
Breast density measurement methods on mammograms: a review	4
Vehicle lane change behavior recognition based on multi-scale three-stream 3D ResNets	4
MCLEMCD: multimodal collaborative learning encoder for enhanced music classification from dances	4
A low-overhead compressed sensing-driven multi-party secret image sharing scheme	4
VoxSeP: semi-positive voxels assist self-supervised 3D medical segmentation	4
Blind quality evaluation for tone-mapped images by exploiting statistical characteristics and deep perceptual features	4
Artistic image adversarial attack via style perturbation	4
MLKD-CLIP: Multi-layer Feature Knowledge Distillation of CLIP for Open-vocabulary Action Recognition	4
View adjustment: helping users improve photographic composition	4
MT-ASM: a multi-task attention strengthening model for fine-grained object recognition	4
Identification of haploid and diploid maize seeds using hybrid transformer model	4
MSCFF-Net: multi-scale context feature fusion network for polyp segmentation	4
Global semantic space feature fusion for multi-view clustering	4
Microblog sentiment analysis via user representative relationship under multi-interaction hybrid neural networks	4
PAGN: perturbation adaption generation network for point cloud adversarial defense	4
Image aesthetics assessment using composite features from transformer and CNN	4