International Journal of Computer Vision

Papers
(The H4-Index of International Journal of Computer Vision is 58. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective2507
Guest Editorial: Special Issue on Open-World Visual Recognition822
Correction: Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization407
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates382
MoDA: Modeling Deformable 3D Objects from Casual Videos357
Correction: Multi-source-free Domain Adaptive Object Detection339
Learning with Enriched Inductive Biases for Vision-Language Models305
Conditional Temporal Variational AutoEncoder for Action Video Prediction217
Instance-dependent Label Distribution Estimation for Learning with Label Noise201
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements193
Invert Your Prompt: Editing-Aware Diffusion Inversion179
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision172
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention162
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting157
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition153
Image Synthesis Under Limited Data: A Survey and Taxonomy153
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation150
Learning Discriminative Features for Visual Tracking via Scenario Decoupling146
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression139
Image-based Morphological Characterization of Filamentous Biological Structures with Non-constant Curvature Shape Feature137
Large-Scale Pre-Trained Models Empowering Phrase Generalization in Temporal Sentence Localization128
Weakly Supervised Salient Object Detection with Text Supervision127
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement126
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence126
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration126
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion123
EAN: Event Adaptive Network for Enhanced Action Recognition118
Robust Averaging using Adaptive Annealing117
Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs110
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting107
AutoIT: Automated Image Tagging with Random Perturbation107
UniAttack: Unified Physical-Digital Face Attack Detection106
Are Vision Transformers Robust to Spurious Correlations?100
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution100
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks99
A Minimal Solution for Image-Based Sphere Estimation98
Delving Deeper into Anti-Aliasing in ConvNets90
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels87
Deep Image Deblurring: A Survey84
Learning Text-to-Video Retrieval from Image Captioning84
Guest Editorial: Special Issue on the British Machine Vision Conference 202280
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization78
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning76
Diagram Perception Networks for Textbook Question Answering via Joint Optimization76
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization75
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow74
Noise-Resistant Multimodal Transformer for Emotion Recognition74
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning72
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose68
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions68
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation67
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models66
Learning Latent Part-Whole Hierarchies for Point Clouds65
Learning Cooperative Neural Modules for Stylized Image Captioning64
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild63
Feature Hallucination for Self-supervised Action Recognition62
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking61
Correction: Consistent Prompt Tuning for Generalized Category Discovery58
0.1554901599884