OOIR: Observatory of International Research

Papers

(The median citation count of Computer Vision and Image Understanding is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Luminance prior guided Low-Light 4C catenary image enhancement	391
Editorial Board	129
Editorial Board	117
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model	116
Editorial Board	114
Exploring using jigsaw puzzles for out-of-distribution detection	98
Extending function mixture network for improved spectral super-resolution	89
MATTE: Multi-task multi-scale attention	66
Editorial Board	60
Editorial Board	53
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications	52
Lightweight feature point detection network with channel enhancement	50
Efficient cross-information fusion decoder for semantic segmentation	50
Emerging image generation with flexible control of perceived difficulty	50
Modality adaptation via feature difference learning for depth human parsing	49
REST: A resolution preserving network for photorealistic style transfer via semantic distillation	44
Siamese self-supervised learning for fine-grained visual classification	44
Spatial Sensitive Grad-CAM++: Towards High-Quality Visual Explanations for Object Detectors via Weighted Combination of Gradient Maps	43
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving	42
JEMA: Joint Embedding of Multimodal and multi-view Alignment in human-centric embedding space for manufacturing	41
SNRD-Net: SNR-aware dual enhancement network for low-light images	40
Convolutional neural network framework for deepfake detection: A diffusion-based approach	40
Deducing health cues from biometric data	38
Feature reconstruction and metric based network for few-shot object detection	38
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics	38

Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation	37
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation	36
NaviFormer: Multimodal scene segmentation for assistive navigation	35
QB-MOTR: A simple query bootstrapping end-to-end multi-object tracking method with transformer	35
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection	35
RelFormer: Advancing contextual relations for transformer-based dense captioning	34
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network	33
Adaptive CNN filter pruning using global importance metric	30
PConvSRGAN: Real-world super-resolution reconstruction with pure convolutional networks	30
A lightweight and robust framework for small object detection in UAV imagery	28
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt	28
Iterative Caption Generation with Heuristic Guidance for enhancing knowledge-based visual question answering	27
Improved Short-term Dense Bottleneck network for efficient scene analysis	27
3D object feature extraction and classification using 3D MF-DFA	27
CCNeXt: An effective self-supervised stereo depth estimation approach	26
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding	26
View-aligned pixel-level feature aggregation for 3D shape classification	25
Editorial Board	25
SIERRA: A robust bilateral feature upsampler for dense prediction	25
Editorial Board	25
Implicit and explicit commonsense for multi-sentence video captioning	24
GaitBranch: A multi-branch refinement model combined with frame-channel attention mechanism for gait recognition	24
Hierarchical contrastive distillation: Bridging multi-level semantics for enhanced knowledge transfer	24
Reverse Stable Diffusion: What prompt was used to generate this image?	23
Hi-ROS: Open-source multi-camera sensor fusion for real-time people tracking	23
Towards efficient image and video style transfer via distillation and learnable feature transformation	23
SDC-Net: A novel selective dilated convolution network for medical images segmentation	23
Are Candidate Models Really Needed for Active Learning?	23
Attribute-guided Relevance Propagation for interpreting image classifier based on Deep Neural Networks	22
A multi-view-CNN framework for deep representation learning in image classification	22
Lightning fast video anomaly detection via multi-scale adversarial distillation	22
Pseudo initialization based Few-Shot Class Incremental Learning	21
Self-supervised network for low-light traffic image enhancement based on deep noise and artifacts removal	21
When super-resolution meets camouflaged object detection: A comparison study	20
Editorial Board	20
Dynamic deep multi-label image data augmentation based on self-paced learning	20
Learning spectral transform for 3D human motion prediction	20
Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors	20
Editorial Board	20
TFUT: Task fusion upward transformer model for multi-task learning on dense prediction	20
LARKED:A lightweight and reliable keypoint detection method for feature matching	19
Unsupervised real image super-resolution via knowledge distillation network	19
M3A: A multimodal misinformation dataset for media authenticity analysis	19
UniMultNet: Action recognition method based on multi-scale feature fusion and video-text constraint guidance	19
Extensions in channel and class dimensions for attention-based knowledge distillation	19
BARD: A Basketball Action Recognition Dataset for multi-label classification	19
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis	18
Uncertainty estimation using boundary prediction for medical image super-resolution	18
An efficient direct solution of the perspective-three-point problem	18
Other tokens matter: Exploring global and local features of Vision Transformers for Object Re-Identification	18

Enhancing feature representation in Siamese networks for object tracking with ranking-based loss	18
Continuous fake media detection: Adapting deepfake detectors to new generative techniques	17
Ensemble learning-based method for maritime background subtraction in open sea environments	17
Multi-dimensional attention-aided transposed ConvBiLSTM network for hyperspectral image super-resolution	17
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection	17
Real-time distributed video analytics for privacy-aware person search	17
A dynamic hybrid network with attention and mamba for image captioning	16
Global key knowledge distillation framework	16
A robust kinship verification scheme using face age transformation	16
CTM: Cross-time temporal module for fine-grained action recognition	15
Sketch-based 3D shape retrieval via teacher–student learning	15
MOSAIC: A multi-view 2.5D organ slice selector with cross-attentional reasoning for anatomically-aware CT localization in medical organ segmentation	15
Casting a BAIT for offline and online source-free domain adaptation	15
Statistical-driven adaptive data augmentation for single-domain generalized object detection	15
Hexagonal mesh-based neural rendering for real-time rendering and fast reconstruction	15
Editorial Board	15
Multi-view cognition with path search for one-shot part labeling	15
Indoor UAV navigation using event cameras and intermediate frame reconstruction	14
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video	14
Few-shot Medical Image Segmentation via Boundary-extended Prototypes and Momentum Inference	14
MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data	14
Attention-induced semantic and boundary interaction network for camouflaged object detection	14
BasicTAD: An astounding RGB-Only baseline for temporal action detection	14
Scribble-based complementary graph reasoning network for weakly supervised salient object detection	14
Semantic-driven diffusion for sign language production with gloss-pose latent spaces alignment	14
Editorial Board	14
SPSC-Net: Shared parallel space-channel attention mechanism transformer network for cell sequence image segmentation	14
Transformed ROIs for capturing visual transformations in videos	14
Deep parametric Retinex decomposition model for low-light image enhancement	14
The shading isophotes: Model and methods for Lambertian planes and a point light	14
TCLR: Temporal contrastive learning for video representation	14
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC	13
Space–time recurrent memory network	13
A LLM-guided hybrid Mamba-Transformer architecture for part-to-whole motion synthesis	13
Learning representational invariances for data-efficient action recognition	13
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives	13
3D Pose Nowcasting: Forecast the future to improve the present	13
Combinational sign language recognition	13
XLITE-Unet: Extremely Light and Efficient Deep learning architecture with selective atrous and axial depthwise convolution for image segmentation	13
Rethink arbitrary style transfer with transformer and contrastive learning	13
α-EGAN: α-Energy distance GAN with an early stopping rule	13
MASK_LOSS guided non-end-to-end image denoising network based on multi-attention module with bias rectified linear unit and absolute pooling unit	13
High-speed autonomous flight and obstacle avoidance for quadrotors in unknown dynamic environments based on imitation learning	13
Multiscale Spatio-Temporal Fusion Network for video dehazing	13
Semi-supervised Cycle-GAN for face photo-sketch translation in the wild	13
EADA: Efficient adaptive data augmentation	13
Tensor robust PCA with nonconvex and nonlocal regularization	12
Real-time fusion of stereo vision and hyperspectral imaging for objective decision support during surgery	12
OFCA-Net: An explainable optical flow-based framework for face forgery detection	12
Extending class activation mapping using Gaussian receptive field	12
Semantic manipulation through the lens of Geometric Algebra	12
Biometric technology roadmapping for personalized augmentative and alternative communication	12
Towards robust 3D human reconstruction with uncertainty-aware low-rank adaptation	12
DiTalker: A unified DiT-based framework for high-quality and style-controllable portrait animation	12
Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation	11
To make yourself invisible with Adversarial Semantic Contours	11
FAR-AMTN: Attention Multi-Task Network for Face Attribute Recognition	11
A multi camera unsupervised domain adaptation pipeline for object detection in cultural sites through adversarial learning and self-training	11
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking	11
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection	11
Deep learning-based estimation of whole-body kinematics from multi-view images	11
Comprehensive regional guidance for attention map semantics in text-to-image diffusion models	11
Edge-aware graph reasoning network for image manipulation localization	11
An effective CNN and Transformer fusion network for camouflaged object detection	11
DM-Align: Leveraging the power of natural language instructions to make changes to images	11
Accurate depth image generation via overfit training of point cloud registration using local frame sets	11
Robust attention ranking architecture with frequency-domain transform to defend against adversarial samples	11
Distributed multi-target tracking and active perception with mobile camera networks	11
4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements	11
Survey on fast dense video segmentation techniques	11
Dual cross-enhancement network for highly accurate dichotomous image segmentation	11
HFINet: Hybrid Feature Integration for enhancing collaborative camouflaged object detection	11
Feature-aligned distillation for dense object detection via refined semantic guidance and distribution consistency	11
Local Consistency Guidance: Personalized Stylization Method of Face Video	11
GSNNet: Group semantic-guided neighbor interaction network for co-salient object detection	11
LightSOD: Towards lightweight and efficient network for salient object detection	11
EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution	11
GradPaint: Gradient-guided inpainting with diffusion models	11
GAN inversion via cross-domain feature fusion and invertibility decomposition	10
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval	10

Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection	10
Bi-granularity balance learning for long-tailed image classification	10
Discriminative object tracking by domain contrast	10
AWADA: Foreground-focused adversarial learning for cross-domain object detection	10
LocoGAN — Locally convolutional GAN	10
FAM: Improving columnar vision transformer with feature attention mechanism	10
Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation	10
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network	10
Adaptive semantic guidance network for video captioning	10
Certifiable algorithms for the two-view planar triangulation problem	10
Exploring black-box adversarial attacks on Interpretable Deep Learning Systems	10
Phase-based video motion magnification with handheld cameras	10
Periocular biometrics and its relevance to partially masked faces: A survey	10
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation	10
BiPG-FER: Bi-intelligence probabilistic graph for facial expression inference drived by action units	10
Context perturbation: A Consistent alignment approach for Domain Adaptive Semantic Segmentation	10
Editorial Board	10
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection	10
Semantically accurate super-resolution Generative Adversarial Networks	10
Editorial Board	10
Editorial Board	9
Object re-identification via spatial–temporal fusion networks and causal identity matching	9
Underwater image quality evaluation via deep meta-learning: Dataset and objective method	9
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective	9
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA	9
View consistency aware holistic triangulation for 3D human pose estimation	9
Human skeletons and change detection for efficient violence detection in surveillance videos	9
OVGrasp: Open-Vocabulary Intent Detection for Grasping Assistance using ExoGlove	9
S2DNet: A self-supervised deraining network using monocular videos	9
Constituent Attention for Vision Transformers	9
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain	9
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow	9
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution	9
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition	9
Bidirectional brain image translation using transfer learning from generic pre-trained models	9
MAL-Net: Multiscale Attention Link Network for accurate eye center detection	9
Lightweight cross-modal transformer for RGB-D salient object detection	9
Editorial Board	9
Editorial Board	9
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification	9
Evaluating the effect of image quantity on Gaussian Splatting: A statistical perspective	9
An efficient three-stage network via Multi-Scale Orthogonal Complementary Transformer for low-light image enhancement	9
Exploring joint embedding predictive architectures for pretraining convolutional neural networks	9
On the coherency of quantitative evaluation of visual explanations	9
A closer look at branch classifiers of multi-exit architectures	9
An image denoising method based on the nonlinear Schrödinger equation and spectral subband decomposition	9
Learning key lines for multi-object tracking	8
A survey on class-agnostic counting: Advancements from reference-based to open-world text-guided approaches	8
Sparse graph matching network for temporal language localization in videos	8
Channel-aware feature mining network for Visible–Infrared Person Re-identification	8
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming	8
SASFNet: Soft-edge awareness and spatial-attention feedback deep network for blind image deblurring	8
AnomalySD: One-for-all few-shot anomaly detection via pre-trained diffusion models	8
Disentangled generation network for enlarged license plate recognition and a unified dataset	8
CMGNet: Collaborative multi-modal graph network for video captioning	8
Editorial Board	8
Time-archival camera virtualization for sports and visual performances	8
Multimodal vs. unimodal approaches to uncertainty in 3D image segmentation under distribution shifts	8
Progressive multi-scale fusion network for RGB-D salient object detection	8
Leaf cultivar identification via prototype-enhanced learning	8
Modality mixer exploiting complementary information for multi-modal action recognition	8
TEMSA:Text enhanced modal representation learning for multimodal sentiment analysis	8
Adaptive feature denoising based deep convolutional network for single image super-resolution	8
Self-supervised vision transformers for semantic segmentation	8
Multi-person 3D pose estimation from a single image captured by a fisheye camera	8
Opti-CAM: Optimizing saliency maps for interpretability	8
Once Upon a Goal: Towards orientation-based shot metrics in football	8
Text-Aided Domain Adaptation for CLIP-like models and application to challenging domain shifts	8
Continual learning on 3D point clouds with random compressed rehearsal	8
: Localized text prompt refinement for zero-shot referring image segmentation	8
Blur aware metric depth estimation with multi-focus plenoptic cameras	8
Cascading attention enhancement network for RGB-D indoor scene segmentation	8
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters	8
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition	8
Made-In: An immersive human-in-the-loop analytics platform for enhancing creative processes in fashion	8
Incorporating degradation estimation in light field spatial super-resolution	8
Multimodal transformer–diffusion framework for large-scale reconstruction of soccer tracking data	8
Certifiable planar relative pose estimation with gravity prior	8
Distribution-aware contrastive learning for domain adaptation in 3D LiDAR segmentation	8
Bypass network for semantics driven image paragraph captioning	8
Slope-Track: Multiple Object Tracking on Ski Slopes	7
Editorial Board	7
TOODIB: Task-aligned one-stage object detection with interactions between branches	7
Adaptive bias learning via gradient-based reweighting and constrained pruning for robust Visual Question Answering	7
CLIP-driven fine-grained mining for text-based person search	7
Font transformer for few-shot font generation	7
Discriminative semantic transitive consistency for cross-modal learning	7
Spatial constraint for efficient semi-supervised video object segmentation	7
IP-CAM: Class activation mapping based on importance weights and principal-component weights for better and simpler visual explanations	7
Style transfer with diffusion models for synthetic-to-real domain adaptation	7
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation	7
Editorial Board	7
Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes	7
Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution	7
A vector quantized masked autoencoder for audiovisual speech emotion recognition	7