International Journal of Computer Vision

Papers
(The median citation count of International Journal of Computer Vision is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)
ArticleCitations
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression1878
Instance-Aware Scene Layout Forecasting590
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective428
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks331
Guest Editorial: Special Issue on Open-World Visual Recognition262
Image Synthesis Under Limited Data: A Survey and Taxonomy249
Instance-dependent Label Distribution Estimation for Learning with Label Noise243
A Minimal Solution for Image-Based Sphere Estimation236
Learning Text-to-Video Retrieval from Image Captioning205
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting174
Delving Deeper into Anti-Aliasing in ConvNets170
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements158
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion154
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence144
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach143
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement132
Correction: Multi-source-free Domain Adaptive Object Detection131
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation123
Learning Discriminative Features for Visual Tracking via Scenario Decoupling122
MoDA: Modeling Deformable 3D Objects from Casual Videos119
Learning with Enriched Inductive Biases for Vision-Language Models118
Conditional Temporal Variational AutoEncoder for Action Video Prediction107
Are Vision Transformers Robust to Spurious Correlations?106
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels104
EAN: Event Adaptive Network for Enhanced Action Recognition99
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision99
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration95
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition95
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates93
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution93
Deep Image Deblurring: A Survey78
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention78
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging76
Guest Editorial: Special Issue on the British Machine Vision Conference 202275
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks74
Bi-calibration Networks for Weakly-Supervised Video Representation Learning72
Semantic-Based Implicit Feature Transform for Few-Shot Classification71
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions70
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking69
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data67
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning66
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization65
Correction: Consistent Prompt Tuning for Generalized Category Discovery64
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild64
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation64
Feature Hallucination for Self-supervised Action Recognition64
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models63
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond61
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer61
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow59
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation59
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset57
Lightweight and Progressively-Scalable Networks for Semantic Segmentation56
Learning Cooperative Neural Modules for Stylized Image Captioning53
Diagram Perception Networks for Textbook Question Answering via Joint Optimization53
VideoQA in the Era of LLMs: An Empirical Study52
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution51
Noise-Resistant Multimodal Transformer for Emotion Recognition50
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose50
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization50
A Realism Metric for Generated LiDAR Point Clouds50
Learning Latent Part-Whole Hierarchies for Point Clouds49
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking49
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning48
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation48
Learning to Prompt for Vision-Language Models47
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition47
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow46
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method46
Image Matting and 3D Reconstruction in One Loop46
Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited44
A Generalized Contour Vibration Model for Building Extraction44
Improving Domain Adaptation Through Class Aware Frequency Transformation43
Modeling Scattering Effect for Under-Display Camera Image Restoration43
Understanding Synonymous Referring Expressions via Contrastive Features41
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset41
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network41
Beyond Learned Metadata-Based Raw Image Reconstruction41
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification40
Control Color: Multimodal Diffusion-Based Interactive Image Colorization40
A CNN Based Approach for the Point-Light Photometric Stereo Problem39
Globally Correlation-Aware Hard Negative Generation38
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces38
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model37
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking37
Paragraph-to-Image Generation with Information-Enriched Diffusion Model36
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining36
Advances in 3D Neural Stylization: A Survey36
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification35
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation35
RePCD-Net: Feature-Aware Recurrent Point Cloud Denoising Network34
Correction: BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos34
Generative Adversarial Network Applications in Industry 4.0: A Review34
Robust Unpaired Image Dehazing via Density and Depth Decomposition33
From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave33
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection33
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion33
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing32
Guest Editorial: Special Issue on Computer Vision from 2D to 3D32
Structured Binary Neural Networks for Image Recognition32
A Region-Based Randers Geodesic Approach for Image Segmentation32
Predictive Display for Teleoperation Based on Vector Fields Using Lidar-Camera Fusion31
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing31
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance31
Investigating Self-Supervised Methods for Label-Efficient Learning31
PartCom: Part Composition Learning for 3D Open-Set Recognition31
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation30
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching29
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models29
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection29
Distribution-Aware Margin Calibration for Semantic Segmentation in Images29
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion29
Shuffled Linear Regression with Outliers in Both Covariates and Responses29
SHARP: Shape-Aware Reconstruction of People in Loose Clothing28
An Optimal Transport View of Class-Imbalanced Visual Recognition28
TokenPacker: Efficient Visual Projector for Multimodal LLM28
Singularity Analysis for the Perspective-Four and Five-Line Problems28
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation28
Semantic Edge Detection with Diverse Deep Supervision27
Beyond Image Prior: Embedding Noise Prior into Latent Space of Conditional Denoising Transformer27
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring27
Countering Malicious DeepFakes: Survey, Battleground, and Horizon26
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition26
Blur Invariants for Image Recognition26
LEO: Generative Latent Image Animator for Human Video Synthesis26
Source-Free Domain Adaptation via Target Prediction Distribution Searching26
Active Perception for Visual-Language Navigation26
Correction: Continual Face Forgery Detection via Historical Distribution Preserving25
Out-of-Distribution Detection with Virtual Outlier Smoothing25
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling25
Anti-Bandit for Neural Architecture Search25
Robust Image Restoration with an Adaptive Huber Function Based Fidelity25
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight25
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models24
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook24
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering24
Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network24
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer24
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing24
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer24
SMPL-IKS: A Mixed Analytical-Neural Inverse Kinematics Solver for 3D Human Mesh Recovery23
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification23
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue23
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation23
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing23
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection23
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding23
Zero-Shot Learning on 3D Point Cloud Objects and Beyond23
Correction: Variational Rectification Inference for Learning with Noisy Labels23
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning23
Knowledge Distillation Meets Open-Set Semi-supervised Learning23
Neural Architecture Search for Dense Prediction Tasks in Computer Vision23
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification23
Correction to: AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach23
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation22
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)22
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection22
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing22
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection22
Leveraging Blur Information for Plenoptic Camera Calibration21
AutoScale: Learning to Scale for Crowd Counting21
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection21
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection21
Image-Based Virtual Try-On: A Survey21
Segment Anything in 3D with Radiance Fields21
Sentimental Visual Captioning using Multimodal Transformer21
Defending Against Adversarial Examples Via Modeling Adversarial Noise21
Generalized Relative Pose and Scale from Affine Correspondences21
Learning 3D Semantic Scene Graphs with Instance Embeddings20
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks20
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement20
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation20
Single-View View Synthesis with Self-rectified Pseudo-Stereo20
A Deeper Analysis of Volumetric Relightable Faces20
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos20
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification20
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning20
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast20
Incremental Model Enhancement via Memory-based Contrastive Learning19
Task Bias in Contrastive Vision-Language Models19
Thread Counting in Plain Weave for Old Paintings Using Regression Deep Learning Models19
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection19
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective19
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization19
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence19
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding19
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark19
Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy19
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes18
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention18
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models18
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation18
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey18
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation18
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors18
Visual Object Tracking in First Person Vision17
Universal Prototype Transport for Zero-Shot Action Recognition and Localization17
Attribute-Centric Compositional Text-to-Image Generation17
Learning Sequence Representations by Non-local Recurrent Neural Memory17
Guest Editorial: Special Issue on Biometrics Security and Privacy17
Correction: Open-Vocabulary Text-Driven Human Image Generation17
Self-supervised Scalable Deep Compressed Sensing16
Transformer for Object Re-identification: A Survey16
Deep Memory-Augmented Proximal Unrolling Network for Compressive Sensing16
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation16
Mitigating Knowledge Discrepancies among Multiple Datasets for Task-agnostic Unified Face Alignment16
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs16
Bayes-CAL: Robust Cross-Modal Alignment by Bayesian Approach for Few-Shot OoD Generalization16
Diagnosing Human-Object Interaction Detectors16
Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation16
Animal-CLIP: A Dual-Prompt Enhanced Vision-Language Model for Animal Action Recognition16
Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering16
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks16
Action2video: Generating Videos of Human 3D Actions16
VLPrompt-PSG: Vision-Language Prompting for Panoptic Scene Graph Generation16
A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion15
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation15
Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization15
Semantic Contrastive Embedding for Generalized Zero-Shot Learning15
FusionBooster: A Unified Image Fusion Boosting Paradigm15
Audio-Visual Segmentation with Semantics15
Domain-Agnostic Priors for Semantic Segmentation Under Unsupervised Domain Adaptation and Domain Generalization15
Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection15
Position-Guided Point Cloud Panoptic Segmentation Transformer15
Multi-Constraint Transferable Generative Adversarial Networks for Cross-Modal Brain Image Synthesis15
Deep Learning-Based Image and Video Inpainting: A Survey15
Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes14
Modality Confusion Learning: A Versatile Framework for Visible-Infrared Re-identification14
Real-Time Neural Radiance Talking Portrait Synthesis via Audio-Spatial Decomposition14
Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation14
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation14
Multi-Text Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model14
Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement14
Warping the Residuals for Image Editing with StyleGAN14
Compositional Prompting for Anti-Forgetting in Domain Incremental Learning14
Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups Using a Single Model Across Cages14
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement14
What Do Visual Models Look At? Dilated Attention for Targeted Transferable Attacks14
Towards Frame Rate Agnostic Multi-object Tracking14
Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints14
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models14
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training14
Open-Vocabulary Text-Driven Human Image Generation14
Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors14
Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos13
Learning Portrait Drawing with Unsupervised Parts13
Unknown Support Prototype Set for Open Set Recognition13
Single Pixel Spectral Color Constancy13
Generalized Gradient Flow Based Saliency for Pruning Deep Convolutional Neural Networks13
Discriminatively Matched Part Tokens for Pointly Supervised Instance Segmentation13
0.14720392227173