Computer Vision and Image Understanding

Papers
(The median citation count of Computer Vision and Image Understanding is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Luminance prior guided Low-Light 4C catenary image enhancement381
Editorial Board126
Efficient cross-information fusion decoder for semantic segmentation115
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation114
Deducing health cues from biometric data111
Editorial Board94
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model88
Editorial Board62
Exploring using jigsaw puzzles for out-of-distribution detection54
Extending function mixture network for improved spectral super-resolution52
Editorial Board50
Editorial Board50
MATTE: Multi-task multi-scale attention50
Feature reconstruction and metric based network for few-shot object detection48
Convolutional neural network framework for deepfake detection: A diffusion-based approach46
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation44
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics42
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving41
SNRD-Net: SNR-aware dual enhancement network for low-light images40
Spatial Sensitive Grad-CAM++: Towards High-Quality Visual Explanations for Object Detectors via Weighted Combination of Gradient Maps39
Lightweight feature point detection network with channel enhancement38
Emerging image generation with flexible control of perceived difficulty38
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications37
Modality adaptation via feature difference learning for depth human parsing36
QB-MOTR: A simple query bootstrapping end-to-end multi-object tracking method with transformer36
Siamese self-supervised learning for fine-grained visual classification35
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection35
REST: A resolution preserving network for photorealistic style transfer via semantic distillation35
Adaptive CNN filter pruning using global importance metric34
RelFormer: Advancing contextual relations for transformer-based dense captioning34
PConvSRGAN: Real-world super-resolution reconstruction with pure convolutional networks33
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding32
3D object feature extraction and classification using 3D MF-DFA30
Editorial Board29
Editorial Board28
SIERRA: A robust bilateral feature upsampler for dense prediction27
CCNeXt: An effective self-supervised stereo depth estimation approach26
View-aligned pixel-level feature aggregation for 3D shape classification26
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt26
A lightweight and robust framework for small object detection in UAV imagery25
Implicit and explicit commonsense for multi-sentence video captioning25
Hierarchical contrastive distillation: Bridging multi-level semantics for enhanced knowledge transfer25
Hi-ROS: Open-source multi-camera sensor fusion for real-time people tracking25
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network24
SDC-Net: A novel selective dilated convolution network for medical images segmentation23
Towards efficient image and video style transfer via distillation and learnable feature transformation23
Reverse Stable Diffusion: What prompt was used to generate this image?23
Attribute-guided Relevance Propagation for interpreting image classifier based on Deep Neural Networks22
Improved Short-term Dense Bottleneck network for efficient scene analysis22
GaitBranch: A multi-branch refinement model combined with frame-channel attention mechanism for gait recognition22
Iterative Caption Generation with Heuristic Guidance for enhancing knowledge-based visual question answering22
Pseudo initialization based Few-Shot Class Incremental Learning21
UniMultNet: Action recognition method based on multi-scale feature fusion and video-text constraint guidance21
Other tokens matter: Exploring global and local features of Vision Transformers for Object Re-Identification21
When super-resolution meets camouflaged object detection: A comparison study21
An efficient direct solution of the perspective-three-point problem20
Editorial Board20
Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors20
Editorial Board20
Unsupervised real image super-resolution via knowledge distillation network20
Learning spectral transform for 3D human motion prediction20
TFUT: Task fusion upward transformer model for multi-task learning on dense prediction19
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis19
Dynamic deep multi-label image data augmentation based on self-paced learning19
Uncertainty estimation using boundary prediction for medical image super-resolution19
Self-supervised network for low-light traffic image enhancement based on deep noise and artifacts removal19
Lightning fast video anomaly detection via multi-scale adversarial distillation18
BARD: A Basketball Action Recognition Dataset for multi-label classification18
LARKED:A lightweight and reliable keypoint detection method for feature matching18
Enhancing feature representation in siamese networks for object tracking with ranking-based loss18
Extensions in channel and class dimensions for attention-based knowledge distillation18
Real-time distributed video analytics for privacy-aware person search17
M317
Multi-dimensional attention-aided transposed ConvBiLSTM network for hyperspectral image super-resolution17
A multi-view-CNN framework for deep representation learning in image classification17
Ensemble learning-based method for maritime background subtraction in open sea environments17
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection17
Continuous fake media detection: Adapting deepfake detectors to new generative techniques17
Few-shot Medical Image Segmentation via Boundary-extended Prototypes and Momentum Inference16
Global key knowledge distillation framework16
Scribble-based complementary graph reasoning network for weakly supervised salient object detection15
CTM: Cross-time temporal module for fine-grained action recognition15
A robust kinship verification scheme using face age transformation15
Sketch-based 3D shape retrieval via teacher–student learning15
Casting a BAIT for offline and online source-free domain adaptation15
MOSAIC: A multi-view 2.5D organ slice selector with cross-attentional reasoning for anatomically-aware CT localization in medical organ segmentation15
Hexagonal mesh-based neural rendering for real-time rendering and fast reconstruction15
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video15
Multi-view cognition with path search for one-shot part labeling15
A dynamic hybrid network with attention and mamba for image captioning15
Statistical-driven adaptive data augmentation for single-domain generalized object detection14
SPSC-Net: Shared parallel space-channel attention mechanism transformer network for cell sequence image segmentation14
TCLR: Temporal contrastive learning for video representation14
Editorial Board14
Indoor UAV navigation using event cameras and intermediate frame reconstruction14
Semantic-driven diffusion for sign language production with gloss-pose latent spaces alignment14
Deep parametric Retinex decomposition model for low-light image enhancement14
Transformed ROIs for capturing visual transformations in videos14
The shading isophotes: Model and methods for Lambertian planes and a point light13
Real-time fusion of stereo vision and hyperspectral imaging for objective decision support during surgery13
Combinational sign language recognition13
XLITE-Unet: Extremely Light and Efficient Deep learning architecture with selective atrous and axial depthwise convolution for image segmentation13
3D Pose Nowcasting: Forecast the future to improve the present13
Editorial Board13
MASK_LOSS guided non-end-to-end image denoising network based on multi-attention module with bias rectified linear unit and absolute pooling unit13
Extending class activation mapping using Gaussian receptive field13
Semi-supervised Cycle-GAN for face photo-sketch translation in the wild13
BasicTAD: An astounding RGB-Only baseline for temporal action detection13
Attention-induced semantic and boundary interaction network for camouflaged object detection13
MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data13
α-EGAN: 13
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC13
Multiscale Spatio-Temporal Fusion Network for video dehazing12
Rethink arbitrary style transfer with transformer and contrastive learning12
EADA: Efficient adaptive data augmentation12
Learning representational invariances for data-efficient action recognition12
Semantic manipulation through the lens of Geometric Algebra12
Space–time recurrent memory network12
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives12
High-speed autonomous flight and obstacle avoidance for quadrotors in unknown dynamic environments based on imitation learning12
Accurate depth image generation via overfit training of point cloud registration using local frame sets12
DM-Align: Leveraging the power of natural language instructions to make changes to images12
A LLM-guided hybrid Mamba-Transformer architecture for part-to-whole motion synthesis12
Tensor robust PCA with nonconvex and nonlocal regularization12
Biometric technology roadmapping for personalized augmentative and alternative communication12
To make yourself invisible with Adversarial Semantic Contours12
Deep learning-based estimation of whole-body kinematics from multi-view images11
Comprehensive regional guidance for attention map semantics in text-to-image diffusion models11
Local Consistency Guidance: Personalized Stylization Method of Face Video11
GSNNet: Group semantic-guided neighbor interaction network for co-salient object detection11
Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation11
LightSOD: Towards lightweight and efficient network for salient object detection11
FAM: Improving columnar vision transformer with feature attention mechanism11
Feature-aligned distillation for dense object detection via refined semantic guidance and distribution consistency11
A multi camera unsupervised domain adaptation pipeline for object detection in cultural sites through adversarial learning and self-training11
Dual cross-enhancement network for highly accurate dichotomous image segmentation11
FAR-AMTN: Attention Multi-Task Network for Face Attribute Recognition11
GAN inversion via cross-domain feature fusion and invertibility decomposition11
Distributed multi-target tracking and active perception with mobile camera networks11
4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements11
Edge-aware graph reasoning network for image manipulation localization11
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking11
Survey on fast dense video segmentation techniques11
An effective CNN and Transformer fusion network for camouflaged object detection11
HFINet: Hybrid Feature Integration for enhancing collaborative camouflaged object detection11
Robust attention ranking architecture with frequency-domain transform to defend against adversarial samples11
GradPaint: Gradient-guided inpainting with diffusion models10
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective10
Bi-granularity balance learning for long-tailed image classification10
Certifiable algorithms for the two-view planar triangulation problem10
Editorial Board10
Discriminative object tracking by domain contrast10
EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution10
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation10
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow10
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection10
LocoGAN — Locally convolutional GAN10
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA10
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection10
On the coherency of quantitative evaluation of visual explanations10
Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation10
Semantically accurate super-resolution Generative Adversarial Networks10
Editorial Board10
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval10
Object re-identification via spatial–temporal fusion networks and causal identity matching9
Incorporating degradation estimation in light field spatial super-resolution9
Exploring joint embedding predictive architectures for pretraining convolutional neural networks9
Editorial Board9
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain9
Editorial Board9
Editorial Board9
Periocular biometrics and its relevance to partially masked faces: A survey9
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network9
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition9
Certifiable planar relative pose estimation with gravity prior9
Adaptive semantic guidance network for video captioning9
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming9
An efficient three-stage network via Multi-Scale Orthogonal Complementary Transformer for low-light image enhancement9
Constituent Attention for Vision Transformers9
AWADA: Foreground-focused adversarial learning for cross-domain object detection9
Context perturbation: A Consistent alignment approach for Domain Adaptive Semantic Segmentation9
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution9
Underwater image quality evaluation via deep meta-learning: Dataset and objective method9
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification9
View consistency aware holistic triangulation for 3D human pose estimation9
An image denoising method based on the nonlinear Schrödinger equation and spectral subband decomposition9
BiPG-FER: Bi-intelligence probabilistic graph for facial expression inference drived by action units9
S2DNet: A self-supervised deraining network using monocular videos9
Exploring black-box adversarial attacks on Interpretable Deep Learning Systems9
Bidirectional brain image translation using transfer learning from generic pre-trained models9
Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection9
Evaluating the effect of image quantity on Gaussian Splatting: A statistical perspective9
Adaptive feature denoising based deep convolutional network for single image super-resolution8
Multi-person 3D pose estimation from a single image captured by a fisheye camera8
Human skeletons and change detection for efficient violence detection in surveillance videos8
Progressive multi-scale fusion network for RGB-D salient object detection8
Leaf cultivar identification via prototype-enhanced learning8
Learning key lines for multi-object tracking8
Distribution-aware contrastive learning for domain adaptation in 3D LiDAR segmentation8
Editorial Board8
Made-In: An immersive human-in-the-loop analytics platform for enhancing creative processes in fashion8
TEMSA:Text enhanced modal representation learning for multimodal sentiment analysis8
Self-supervised vision transformers for semantic segmentation8
SASFNet: Soft-edge awareness and spatial-attention feedback deep network for blind image deblurring8
Disentangled generation network for enlarged license plate recognition and a unified dataset8
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition8
: Localized text prompt refinement for zero-shot referring image segmentation8
Blur aware metric depth estimation with multi-focus plenoptic cameras8
Cascading attention enhancement network for RGB-D indoor scene segmentation8
Dual adversarial model: Exploring low-dimensional space features for point clouds generating and completing8
MAL-Net: Multiscale Attention Link Network for accurate eye center detection8
Modality mixer exploiting complementary information for multi-modal action recognition8
Opti-CAM: Optimizing saliency maps for interpretability8
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters8
Multimodal transformer–diffusion framework for large-scale reconstruction of soccer tracking data8
AnomalySD: One-for-all few-shot anomaly detection via pre-trained diffusion models8
OVGrasp: Open-Vocabulary Intent Detection for Grasping Assistance using ExoGlove8
Once Upon a Goal: Towards orientation-based shot metrics in football8
Channel-aware feature mining network for Visible–Infrared Person Re-identification8
Sparse graph matching network for temporal language localization in videos8
Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity8
Bypass network for semantics driven image paragraph captioning8
Lightweight cross-modal transformer for RGB-D salient object detection8
A closer look at branch classifiers of multi-exit architectures8
CMGNet: Collaborative multi-modal graph network for video captioning8
Time-archival camera virtualization for sports and visual performances7
Continual learning on 3D point clouds with random compressed rehearsal7
Style transfer with diffusion models for synthetic-to-real domain adaptation7
Discriminative semantic transitive consistency for cross-modal learning7
Local to global purification strategy to realize collaborative camouflaged object detection7
Improving rare relation inferring for scene graph generation using bipartite graph network7
RSTC: Residual Swin Transformer Cascade to approximate Taylor expansion for image denoising7
CLIP-driven fine-grained mining for text-based person search7
Text-Aided Domain Adaptation for CLIP-like models and application to challenging domain shifts7
A survey on class-agnostic counting: Advancements from reference-based to open-world text-guided approaches7
Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes7
Editorial Board7
Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution7
A vector quantized masked autoencoder for audiovisual speech emotion recognition7
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation7
Spatial constraint for efficient semi-supervised video object segmentation7
A real-time image super-resolution model based on U-shaped deep feature extraction module7
Multimodal vs. unimodal approaches to uncertainty in 3D image segmentation under distribution shifts7
Invisible backdoor attack with attention and steganography7
STARS: Semantics-Aware Text-guided Aerial Image Refinement and Synthesis7
AuxFlow: Anchor-grounded homography estimation through flow-guided auxiliary points for Soccer field registration and player localization7
MuRE: Multi-Relationship Encoder for 3D human pose estimation7
Slope-Track: Multiple Object Tracking on Ski Slopes7
Editorial Board7
FDPAdapter : Adapting segment anything in challenging vision tasks via frequency-domain priors7
0.42620611190796