Computer Vision and Image Understanding

Papers
(The median citation count of Computer Vision and Image Understanding is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)
ArticleCitations
Editorial Board273
Editorial Board257
Editorial Board217
Editorial Board140
Editorial Board101
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model100
Exploring using jigsaw puzzles for out-of-distribution detection83
Feature reconstruction and metric based network for few-shot object detection82
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving80
Luminance prior guided Low-Light 4C catenary image enhancement68
Siamese self-supervised learning for fine-grained visual classification68
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection65
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation48
Deducing health cues from biometric data45
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications39
Emerging image generation with flexible control of perceived difficulty39
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation38
Extending function mixture network for improved spectral super-resolution34
MATTE: Multi-task multi-scale attention34
Modality adaptation via feature difference learning for depth human parsing33
Lightweight feature point detection network with channel enhancement32
Convolutional neural network framework for deepfake detection: A diffusion-based approach31
Efficient cross-information fusion decoder for semantic segmentation30
Decoupled appearance and motion learning for efficient anomaly detection in surveillance video29
Implicit and explicit commonsense for multi-sentence video captioning27
View-aligned pixel-level feature aggregation for 3D shape classification27
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics27
Editorial Board27
3D object feature extraction and classification using 3D MF-DFA27
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt27
Adaptive CNN filter pruning using global importance metric27
Editorial Board27
Towards efficient image and video style transfer via distillation and learnable feature transformation25
RelFormer: Advancing contextual relations for transformer-based dense captioning25
SIERRA: A robust bilateral feature upsampler for dense prediction25
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network25
Robust detection of dehazed images via dual-stream CNNs with adaptive feature fusion25
Reverse Stable Diffusion: What prompt was used to generate this image?24
Editorial Board22
Hi-ROS: Open-source multi-camera sensor fusion for real-time people tracking22
Improved Short-term Dense Bottleneck network for efficient scene analysis22
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding22
Editorial Board21
Unsupervised real image super-resolution via knowledge distillation network21
Uncertainty estimation using boundary prediction for medical image super-resolution21
Subspace reconstruction based correlation filter for object tracking20
Hallucinating uncertain motion and future for static image action recognition20
Learning spectral transform for 3D human motion prediction20
Online real-time pedestrian tracking from medium altitude aerial footage with camera motion cancellation19
Attribute-guided Relevance Propagation for interpreting image classifier based on Deep Neural Networks19
Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors19
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis19
Continuous fake media detection: Adapting deepfake detectors to new generative techniques19
Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition18
When super-resolution meets camouflaged object detection: A comparison study18
Self-supervised network for low-light traffic image enhancement based on deep noise and artifacts removal18
Pseudo initialization based Few-Shot Class Incremental Learning18
Lightning fast video anomaly detection via multi-scale adversarial distillation17
Dissected 3D CNNs: Temporal skip connections for efficient online video processing17
M317
A multi-view-CNN framework for deep representation learning in image classification16
Other tokens matter: Exploring global and local features of Vision Transformers for Object Re-Identification16
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection16
Extensions in channel and class dimensions for attention-based knowledge distillation16
Transformed ROIs for capturing visual transformations in videos15
Scribble-based complementary graph reasoning network for weakly supervised salient object detection15
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video15
Real-time distributed video analytics for privacy-aware person search15
TFUT: Task fusion upward transformer model for multi-task learning on dense prediction15
Hexagonal mesh-based neural rendering for real-time rendering and fast reconstruction15
Multi-view cognition with path search for one-shot part labeling15
Semantic-driven diffusion for sign language production with gloss-pose latent spaces alignment14
Ensemble learning-based method for maritime background subtraction in open sea environments14
CTM: Cross-time temporal module for fine-grained action recognition13
TCLR: Temporal contrastive learning for video representation13
Sketch-based 3D shape retrieval via teacher–student learning13
Global key knowledge distillation framework13
Casting a BAIT for offline and online source-free domain adaptation13
A robust kinship verification scheme using face age transformation13
Multi-dimensional attention-aided transposed ConvBiLSTM network for hyperspectral image super-resolution13
Deep parametric Retinex decomposition model for low-light image enhancement13
Combinational sign language recognition12
Editorial Board12
Semi-supervised Cycle-GAN for face photo-sketch translation in the wild12
3D Pose Nowcasting: Forecast the future to improve the present12
BasicTAD: An astounding RGB-Only baseline for temporal action detection12
Rethink arbitrary style transfer with transformer and contrastive learning12
MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data12
The shading isophotes: Model and methods for Lambertian planes and a point light12
Attention-induced semantic and boundary interaction network for camouflaged object detection12
Tensor robust PCA with nonconvex and nonlocal regularization12
MASK_LOSS guided non-end-to-end image denoising network based on multi-attention module with bias rectified linear unit and absolute pooling unit11
Extending class activation mapping using Gaussian receptive field11
Space–time recurrent memory network11
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives11
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC11
Learning representational invariances for data-efficient action recognition11
FAM: Improving columnar vision transformer with feature attention mechanism10
Survey on fast dense video segmentation techniques10
GradPaint: Gradient-guided inpainting with diffusion models10
α-EGAN: 10
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking10
Deep learning-based estimation of whole-body kinematics from multi-view images10
Monocular 3D multi-person pose estimation via predicting factorized correction factors10
Dual cross-enhancement network for highly accurate dichotomous image segmentation10
Accurate depth image generation via overfit training of point cloud registration using local frame sets10
Facial landmark points detection using knowledge distillation-based neural networks10
LightSOD: Towards lightweight and efficient network for salient object detection10
A multi camera unsupervised domain adaptation pipeline for object detection in cultural sites through adversarial learning and self-training10
DM-Align: Leveraging the power of natural language instructions to make changes to images10
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection10
Semantic manipulation through the lens of Geometric Algebra10
Robust attention ranking architecture with frequency-domain transform to defend against adversarial samples10
Local Consistency Guidance: Personalized Stylization Method of Face Video10
To make yourself invisible with Adversarial Semantic Contours10
GSNNet: Group semantic-guided neighbor interaction network for co-salient object detection9
Editorial Board9
Dehazing cost volume for deep multi-view stereo in scattering media with airlight and scattering coefficient estimation9
Discriminative object tracking by domain contrast9
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition9
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective9
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network9
Editorial Board9
Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation9
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation9
MetaVD: A Meta Video Dataset for enhancing human action recognition datasets9
Distributed multi-target tracking and active perception with mobile camera networks9
Editorial Board9
4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements9
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow9
LocoGAN — Locally convolutional GAN9
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval8
Underwater image quality evaluation via deep meta-learning: Dataset and objective method8
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain8
Editorial Board8
Bidirectional brain image translation using transfer learning from generic pre-trained models8
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution8
Adaptive semantic guidance network for video captioning8
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA8
On the coherency of quantitative evaluation of visual explanations8
AWADA: Foreground-focused adversarial learning for cross-domain object detection8
Editorial Board8
Periocular biometrics and its relevance to partially masked faces: A survey8
Certifiable algorithms for the two-view planar triangulation problem8
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification8
View consistency aware holistic triangulation for 3D human pose estimation8
Editorial Board8
A distribution independence based method for 3D face shape decomposition8
Semantically accurate super-resolution Generative Adversarial Networks8
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection8
MAIN: Multi-Attention Instance Network for video segmentation7
CMGNet: Collaborative multi-modal graph network for video captioning7
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming7
A closer look at branch classifiers of multi-exit architectures7
Opti-CAM: Optimizing saliency maps for interpretability7
SAPS: Self-Attentive Pathway Search for weakly-supervised action localization with background-action augmentation7
Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks7
Editorial Board7
Dual adversarial model: Exploring low-dimensional space features for point clouds generating and completing7
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition7
Multi-person 3D pose estimation from a single image captured by a fisheye camera7
Certifiable planar relative pose estimation with gravity prior7
Self-supervised vision transformers for semantic segmentation7
Constituent Attention for Vision Transformers7
An image denoising method based on the nonlinear Schrödinger equation and spectral subband decomposition7
Incorporating degradation estimation in light field spatial super-resolution7
Modality mixer exploiting complementary information for multi-modal action recognition7
Light-weight shadow detection via GCN-based annotation strategy and knowledge distillation7
2.5D visual relationship detection7
MAL-Net: Multiscale Attention Link Network for accurate eye center detection7
Plug-and-Play video super-resolution using edge-preserving filtering7
Lightweight cross-modal transformer for RGB-D salient object detection7
Adaptive feature denoising based deep convolutional network for single image super-resolution7
TEMSA:Text enhanced modal representation learning for multimodal sentiment analysis7
Disentangled generation network for enlarged license plate recognition and a unified dataset7
Diversified text-to-image generation via deep mutual information estimation7
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters7
Bypass network for semantics driven image paragraph captioning7
Leaf cultivar identification via prototype-enhanced learning7
Human skeletons and change detection for efficient violence detection in surveillance videos7
Invisible backdoor attack with attention and steganography6
Visual object tracking: A survey6
Editorial Board6
Style transfer with diffusion models for synthetic-to-real domain adaptation6
Re-scoring using image-language similarity for few-shot object detection6
Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes6
Discriminative semantic transitive consistency for cross-modal learning6
Editorial Board6
Addressing multiple salient object detection via dual-space long-range dependencies6
Blur aware metric depth estimation with multi-focus plenoptic cameras6
Learning key lines for multi-object tracking6
Conditioning diffusion models via attributes and semantic masks for face generation6
BacklitNet: A dataset and network for backlit image enhancement6
Weakly supervised fine-grained image classification via two-level attention activation model6
RSTC: Residual Swin Transformer Cascade to approximate Taylor expansion for image denoising6
A vector quantized masked autoencoder for audiovisual speech emotion recognition6
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation6
Editorial Board6
A linear method for camera pair self-calibration6
Improving rare relation inferring for scene graph generation using bipartite graph network6
Progressive multi-scale fusion network for RGB-D salient object detection6
Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity6
Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos6
Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution6
Font transformer for few-shot font generation6
ParticleAugment: Sampling-based data augmentation6
Local to global purification strategy to realize collaborative camouflaged object detection6
Few-shot action recognition with implicit temporal alignment and pair similarity optimization6
AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction6
Detecting abnormality with separated foreground and background: Mutual Generative Adversarial Networks for video abnormal event detection6
CUFD: An encoder–decoder network for visible and infrared image fusion based on common and unique feature decomposition6
Spatial constraint for efficient semi-supervised video object segmentation6
Editorial Board6
Continual learning on 3D point clouds with random compressed rehearsal6
Sparse graph matching network for temporal language localization in videos6
Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization5
Adversarial Neon Beam: A light-based physical attack to DNNs5
Automatic detection and localization of thighbone fractures in X-ray based on improved deep learning method5
MT-DSNet: Mix-mask teacher–student strategies and dual dynamic selection plug-in module for fine-grained image recognition5
Robust visual question answering via semantic cross modal augmentation5
Uncertainty guided test-time training for face forgery detection5
Single and multiple illuminant estimation using convex functions5
Multimodality-guided Visual-Caption Semantic Enhancement5
LandmarkBreaker: A proactive method to obstruct DeepFakes via disrupting facial landmark extraction5
Learning to teach and learn for semi-supervised few-shot image classification5
Local optimization cropping and boundary enhancement for end-to-end weakly-supervised segmentation network5
Towards adversarial robustness verification of no-reference image- and video-quality metrics5
Quaternion-based dynamic mode decomposition for background modeling in color videos5
Estimating 3D body mesh without SMPL annotations via alternating successive convex approximation5
A simple but effective vision transformer framework for visible–infrared person re-identification5
Weakly supervised learning of multi-object 3D scene decompositions using deep shape priors5
RocNet: Recursive octree network for efficient 3D processing5
Self-knowledge distillation via dropout5
Intrinsic image decomposition using physics-based cues and CNNs5
Cascade transformers with dynamic attention for video question answering5
Region-aware image-based human action retrieval with transformers5
LOFReg: An outlier-based regulariser for deep metric learning5
An egocentric video and eye-tracking dataset for visual search in convenience stores5
Head pose estimation with uncertainty and an application to dyadic interaction detection5
Deconfounded hierarchical multi-granularity classification5
Multispectral interaction convolutional neural network for pedestrian detection5
Progressive Recurrent Network for shadow removal5
Camouflaged object detection via Neighbor Connection and Hierarchical Information Transfer5
A novel image inpainting method based on a modified Lengyel–Epstein model5
LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR5
CAFNet: Context aligned fusion for depth completion5
Multi-domain awareness for compressed deepfake videos detection over social networks guided by common mechanisms between artifacts5
Progressive scene text erasing with self-supervision5
Cleanness-navigated-contamination network: A unified framework for recovering regional degradation5
Siamese Graph Attention Networks for robust visual object tracking5
0.077372074127197