Computer Vision and Image Understanding

Papers
(The median citation count of Computer Vision and Image Understanding is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Luminance prior guided Low-Light 4C catenary image enhancement320
Editorial Board252
Efficient cross-information fusion decoder for semantic segmentation123
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications113
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection100
Editorial Board100
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model93
Editorial Board85
Exploring using jigsaw puzzles for out-of-distribution detection74
Extending function mixture network for improved spectral super-resolution54
MATTE: Multi-task multi-scale attention52
Editorial Board47
Editorial Board45
Modality adaptation via feature difference learning for depth human parsing40
Feature reconstruction and metric based network for few-shot object detection38
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics38
Emerging image generation with flexible control of perceived difficulty37
Lightweight feature point detection network with channel enhancement37
Convolutional neural network framework for deepfake detection: A diffusion-based approach35
Deducing health cues from biometric data35
REST: A resolution preserving network for photorealistic style transfer via semantic distillation35
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation35
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation34
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving34
Siamese self-supervised learning for fine-grained visual classification33
PConvSRGAN: Real-world super-resolution reconstruction with pure convolutional networks32
RelFormer: Advancing contextual relations for transformer-based dense captioning32
Editorial Board31
GaitBranch: A multi-branch refinement model combined with frame-channel attention mechanism for gait recognition31
Improved Short-term Dense Bottleneck network for efficient scene analysis30
Editorial Board30
Robust detection of dehazed images via dual-stream CNNs with adaptive feature fusion29
SIERRA: A robust bilateral feature upsampler for dense prediction28
Iterative Caption Generation with Heuristic Guidance for enhancing knowledge-based visual question answering28
3D object feature extraction and classification using 3D MF-DFA28
View-aligned pixel-level feature aggregation for 3D shape classification27
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network27
CCNeXt: An effective self-supervised stereo depth estimation approach26
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt25
Hi-ROS: Open-source multi-camera sensor fusion for real-time people tracking25
Implicit and explicit commonsense for multi-sentence video captioning25
Reverse Stable Diffusion: What prompt was used to generate this image?24
SDC-Net: A novel selective dilated convolution network for medical images segmentation24
Adaptive CNN filter pruning using global importance metric24
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding23
Attribute-guided Relevance Propagation for interpreting image classifier based on Deep Neural Networks22
Towards efficient image and video style transfer via distillation and learnable feature transformation22
Lightning fast video anomaly detection via multi-scale adversarial distillation21
Continuous fake media detection: Adapting deepfake detectors to new generative techniques21
When super-resolution meets camouflaged object detection: A comparison study21
Extensions in channel and class dimensions for attention-based knowledge distillation21
Pseudo initialization based Few-Shot Class Incremental Learning21
Other tokens matter: Exploring global and local features of Vision Transformers for Object Re-Identification20
Online real-time pedestrian tracking from medium altitude aerial footage with camera motion cancellation20
Unsupervised real image super-resolution via knowledge distillation network19
An efficient direct solution of the perspective-three-point problem19
Learning spectral transform for 3D human motion prediction19
Self-supervised network for low-light traffic image enhancement based on deep noise and artifacts removal19
Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors19
Editorial Board18
Editorial Board18
Dissected 3D CNNs: Temporal skip connections for efficient online video processing18
M317
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis17
Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition17
Dynamic deep multi-label image data augmentation based on self-paced learning17
UniMultNet: Action recognition method based on multi-scale feature fusion and video-text constraint guidance17
Hallucinating uncertain motion and future for static image action recognition17
Uncertainty estimation using boundary prediction for medical image super-resolution16
TFUT: Task fusion upward transformer model for multi-task learning on dense prediction16
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection15
Hexagonal mesh-based neural rendering for real-time rendering and fast reconstruction15
A multi-view-CNN framework for deep representation learning in image classification15
CTM: Cross-time temporal module for fine-grained action recognition15
Global key knowledge distillation framework14
A robust kinship verification scheme using face age transformation14
Few-shot Medical Image Segmentation via Boundary-extended Prototypes and Momentum Inference14
Multi-dimensional attention-aided transposed ConvBiLSTM network for hyperspectral image super-resolution14
Scribble-based complementary graph reasoning network for weakly supervised salient object detection14
Real-time distributed video analytics for privacy-aware person search14
SPSC-Net: Shared parallel space-channel attention mechanism transformer network for cell sequence image segmentation14
Transformed ROIs for capturing visual transformations in videos14
Sketch-based 3D shape retrieval via teacher–student learning14
MOSAIC: A multi-view 2.5D organ slice selector with cross-attentional reasoning for anatomically-aware CT localization in medical organ segmentation14
MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data13
Editorial Board13
BasicTAD: An astounding RGB-Only baseline for temporal action detection13
Multi-view cognition with path search for one-shot part labeling13
Attention-induced semantic and boundary interaction network for camouflaged object detection13
Semantic-driven diffusion for sign language production with gloss-pose latent spaces alignment13
The shading isophotes: Model and methods for Lambertian planes and a point light13
Learning representational invariances for data-efficient action recognition13
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video13
A dynamic hybrid network with attention and mamba for image captioning13
Casting a BAIT for offline and online source-free domain adaptation13
Space–time recurrent memory network13
α-EGAN: 13
Deep parametric Retinex decomposition model for low-light image enhancement13
Ensemble learning-based method for maritime background subtraction in open sea environments13
TCLR: Temporal contrastive learning for video representation13
Extending class activation mapping using Gaussian receptive field12
3D Pose Nowcasting: Forecast the future to improve the present12
Tensor robust PCA with nonconvex and nonlocal regularization12
MASK_LOSS guided non-end-to-end image denoising network based on multi-attention module with bias rectified linear unit and absolute pooling unit12
Multiscale Spatio-Temporal Fusion Network for video dehazing12
A LLM-guided hybrid Mamba-Transformer architecture for part-to-whole motion synthesis12
Semi-supervised Cycle-GAN for face photo-sketch translation in the wild12
Rethink arbitrary style transfer with transformer and contrastive learning12
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives12
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC12
XLITE-Unet: Extremely Light and Efficient Deep learning architecture with selective atrous and axial depthwise convolution for image segmentation12
Combinational sign language recognition12
Real-time fusion of stereo vision and hyperspectral imaging for objective decision support during surgery12
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking11
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection11
Feature-aligned distillation for dense object detection via refined semantic guidance and distribution consistency11
Semantic manipulation through the lens of Geometric Algebra11
A multi camera unsupervised domain adaptation pipeline for object detection in cultural sites through adversarial learning and self-training11
GSNNet: Group semantic-guided neighbor interaction network for co-salient object detection11
EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution11
Comprehensive regional guidance for attention map semantics in text-to-image diffusion models11
Local Consistency Guidance: Personalized Stylization Method of Face Video11
Accurate depth image generation via overfit training of point cloud registration using local frame sets11
To make yourself invisible with Adversarial Semantic Contours11
Edge-aware graph reasoning network for image manipulation localization11
Dual cross-enhancement network for highly accurate dichotomous image segmentation11
FAR-AMTN: Attention Multi-Task Network for Face Attribute Recognition11
Facial landmark points detection using knowledge distillation-based neural networks11
Distributed multi-target tracking and active perception with mobile camera networks10
GradPaint: Gradient-guided inpainting with diffusion models10
Survey on fast dense video segmentation techniques10
Deep learning-based estimation of whole-body kinematics from multi-view images10
DM-Align: Leveraging the power of natural language instructions to make changes to images10
LightSOD: Towards lightweight and efficient network for salient object detection10
Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation10
4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements10
An effective CNN and Transformer fusion network for camouflaged object detection10
Robust attention ranking architecture with frequency-domain transform to defend against adversarial samples10
Periocular biometrics and its relevance to partially masked faces: A survey9
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow9
Editorial Board9
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification9
View consistency aware holistic triangulation for 3D human pose estimation9
FAM: Improving columnar vision transformer with feature attention mechanism9
Underwater image quality evaluation via deep meta-learning: Dataset and objective method9
Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection9
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network9
Bidirectional brain image translation using transfer learning from generic pre-trained models9
Bi-granularity balance learning for long-tailed image classification9
LocoGAN — Locally convolutional GAN9
Semantically accurate super-resolution Generative Adversarial Networks9
Discriminative object tracking by domain contrast9
Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation9
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective9
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation9
Evaluating the effect of image quantity on Gaussian Splatting: A statistical perspective9
AWADA: Foreground-focused adversarial learning for cross-domain object detection9
Editorial Board9
Context perturbation: A Consistent alignment approach for Domain Adaptive Semantic Segmentation9
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection9
Certifiable algorithms for the two-view planar triangulation problem9
Adaptive semantic guidance network for video captioning9
On the coherency of quantitative evaluation of visual explanations9
Exploring black-box adversarial attacks on Interpretable Deep Learning Systems9
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming8
Editorial Board8
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution8
Adaptive feature denoising based deep convolutional network for single image super-resolution8
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval8
CMGNet: Collaborative multi-modal graph network for video captioning8
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain8
Incorporating degradation estimation in light field spatial super-resolution8
MAL-Net: Multiscale Attention Link Network for accurate eye center detection8
Certifiable planar relative pose estimation with gravity prior8
Disentangled generation network for enlarged license plate recognition and a unified dataset8
S2DNet: A self-supervised deraining network using monocular videos8
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition8
Light-weight shadow detection via GCN-based annotation strategy and knowledge distillation8
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA8
Self-supervised vision transformers for semantic segmentation8
Multi-person 3D pose estimation from a single image captured by a fisheye camera8
SASFNet: Soft-edge awareness and spatial-attention feedback deep network for blind image deblurring8
Constituent Attention for Vision Transformers8
Editorial Board8
Editorial Board8
Exploring joint embedding predictive architectures for pretraining convolutional neural networks8
Lightweight cross-modal transformer for RGB-D salient object detection8
An image denoising method based on the nonlinear Schrödinger equation and spectral subband decomposition8
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters8
Opti-CAM: Optimizing saliency maps for interpretability8
A closer look at branch classifiers of multi-exit architectures8
Made-In: An immersive human-in-the-loop analytics platform for enhancing creative processes in fashion8
Dual adversarial model: Exploring low-dimensional space features for point clouds generating and completing7
Editorial Board7
Human skeletons and change detection for efficient violence detection in surveillance videos7
Editorial Board7
TEMSA:Text enhanced modal representation learning for multimodal sentiment analysis7
Blur aware metric depth estimation with multi-focus plenoptic cameras7
Modality mixer exploiting complementary information for multi-modal action recognition7
Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes7
A vector quantized masked autoencoder for audiovisual speech emotion recognition7
Editorial Board7
Plug-and-Play video super-resolution using edge-preserving filtering7
Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution7
: Localized text prompt refinement for zero-shot referring image segmentation7
IP-CAM: Class activation mapping based on importance weights and principal-component weights for better and simpler visual explanations7
Channel-aware feature mining network for Visible–Infrared Person Re-identification7
Invisible backdoor attack with attention and steganography7
Continual learning on 3D point clouds with random compressed rehearsal7
Leaf cultivar identification via prototype-enhanced learning7
Multimodal transformer–diffusion framework for large-scale reconstruction of soccer tracking data7
Conditioning diffusion models via attributes and semantic masks for face generation7
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation7
Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity7
Discriminative semantic transitive consistency for cross-modal learning7
Learning key lines for multi-object tracking7
Improving rare relation inferring for scene graph generation using bipartite graph network7
Distribution-aware contrastive learning for domain adaptation in 3D LiDAR segmentation7
Spatial constraint for efficient semi-supervised video object segmentation7
Cascading attention enhancement network for RGB-D indoor scene segmentation7
Parametric kernels for artifact mitigation in patch-based image aggregation using generative models7
Multimodal vs. unimodal approaches to uncertainty in 3D image segmentation under distribution shifts7
Adaptive bias learning via gradient-based reweighting and constrained pruning for robust Visual Question Answering7
Bypass network for semantics driven image paragraph captioning7
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition7
Progressive multi-scale fusion network for RGB-D salient object detection7
Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos7
BacklitNet: A dataset and network for backlit image enhancement7
Addressing multiple salient object detection via dual-space long-range dependencies7
Editorial Board7
2.5D visual relationship detection7
Visual object tracking: A survey7
Sparse graph matching network for temporal language localization in videos7
Font transformer for few-shot font generation7
Efficient dual attention SlowFast networks for video action recognition6
LAM-YOLO: Drones-based small object detection on lighting-occlusion attention mechanism YOLO6
Scene-cGAN: A GAN for underwater restoration and scene depth estimation6
A lightweight convolutional neural network-based feature extractor for visible images6
A survey on bias in visual datasets6
A configurable global context reconstruction hybrid detector for enhanced small object detection in UAV aerial imagery6
MAEDAY: MAE for few- and zero-shot AnomalY-Detection6
Multimodality-guided Visual-Caption Semantic Enhancement6
DiffExplainer: Towards cross-modal global explanations with diffusion models6
Recognizing facial expressions based on pyramid multi-head grid and spatial attention network6
Multi-domain awareness for compressed deepfake videos detection over social networks guided by common mechanisms between artifacts6
Attention-based multimodal image matching6
CAFNet: Context aligned fusion for depth completion6
Enhancing action recognition by leveraging the hierarchical structure of actions and textual context6
UUD-Fusion: An unsupervised universal image fusion approach via generative diffusion model6
Intrinsic image decomposition using physics-based cues and CNNs6
0.20890712738037