International Journal of Computer Vision

Papers
(The TQCC of International Journal of Computer Vision is 11. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression1612
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks1175
Guest Editorial: Special Issue on Open-World Visual Recognition1171
Instance-Aware Scene Layout Forecasting489
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective400
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics293
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence249
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration236
Correction: Multi-source-free Domain Adaptive Object Detection210
Learning Text-to-Video Retrieval from Image Captioning208
Learning Discriminative Features for Visual Tracking via Scenario Decoupling187
MoDA: Modeling Deformable 3D Objects from Casual Videos161
Conditional Temporal Variational AutoEncoder for Action Video Prediction160
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting148
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation142
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement140
Are Vision Transformers Robust to Spurious Correlations?137
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision129
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition125
Learning with Enriched Inductive Biases for Vision-Language Models119
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion117
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements113
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach111
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution105
Image Synthesis Under Limited Data: A Survey and Taxonomy103
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention102
Delving Deeper into Anti-Aliasing in ConvNets98
EAN: Event Adaptive Network for Enhanced Action Recognition95
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates94
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels92
Deep Image Deblurring: A Survey92
Instance-dependent Label Distribution Estimation for Learning with Label Noise85
A Minimal Solution for Image-Based Sphere Estimation85
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking84
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization81
VideoQA in the Era of LLMs: An Empirical Study72
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions70
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning68
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow67
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data65
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer63
Semantic-Based Implicit Feature Transform for Few-Shot Classification61
Feature Hallucination for Self-supervised Action Recognition61
Bi-calibration Networks for Weakly-Supervised Video Representation Learning61
Lightweight and Progressively-Scalable Networks for Semantic Segmentation60
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow60
Diagram Perception Networks for Textbook Question Answering via Joint Optimization59
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation56
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks54
A Realism Metric for Generated LiDAR Point Clouds54
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset53
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation53
Guest Editorial: Special Issue on the British Machine Vision Conference 202252
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging51
Learning Cooperative Neural Modules for Stylized Image Captioning50
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond49
Learning Latent Part-Whole Hierarchies for Point Clouds48
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning47
Correction: Consistent Prompt Tuning for Generalized Category Discovery47
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking47
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization45
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation44
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution44
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild43
Noise-Resistant Multimodal Transformer for Emotion Recognition43
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose43
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models42
Learning to Prompt for Vision-Language Models42
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification41
Image Matting and 3D Reconstruction in One Loop41
Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited41
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method41
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition41
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces39
Understanding Synonymous Referring Expressions via Contrastive Features39
Beyond Learned Metadata-Based Raw Image Reconstruction39
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset38
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network38
Improving Domain Adaptation Through Class Aware Frequency Transformation37
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining37
Modeling Scattering Effect for Under-Display Camera Image Restoration37
Globally Correlation-Aware Hard Negative Generation37
A CNN Based Approach for the Point-Light Photometric Stereo Problem36
A Generalized Contour Vibration Model for Building Extraction36
Paragraph-to-Image Generation with Information-Enriched Diffusion Model36
From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave35
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model35
Control Color: Multimodal Diffusion-Based Interactive Image Colorization34
Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild33
Correction: BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos33
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification33
Generative Adversarial Network Applications in Industry 4.0: A Review32
Advances in 3D Neural Stylization: A Survey32
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion31
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection30
RePCD-Net: Feature-Aware Recurrent Point Cloud Denoising Network30
Structured Binary Neural Networks for Image Recognition30
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking30
Robust Unpaired Image Dehazing via Density and Depth Decomposition30
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation30
Guest Editorial: Special Issue on Computer Vision from 2D to 3D29
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing29
Investigating Self-Supervised Methods for Label-Efficient Learning28
Assignment Flow for Order-Constrained OCT Segmentation28
PartCom: Part Composition Learning for 3D Open-Set Recognition28
Active Perception for Visual-Language Navigation28
TokenPacker: Efficient Visual Projector for Multimodal LLM28
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance28
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion27
SHARP: Shape-Aware Reconstruction of People in Loose Clothing27
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching27
Distribution-Aware Margin Calibration for Semantic Segmentation in Images26
LEO: Generative Latent Image Animator for Human Video Synthesis26
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing26
An Optimal Transport View of Class-Imbalanced Visual Recognition26
Source-Free Domain Adaptation via Target Prediction Distribution Searching26
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation26
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation26
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection26
Beyond Image Prior: Embedding Noise Prior into Latent Space of Conditional Denoising Transformer25
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models25
Countering Malicious DeepFakes: Survey, Battleground, and Horizon25
Singularity Analysis for the Perspective-Four and Five-Line Problems25
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition25
Shuffled Linear Regression with Outliers in Both Covariates and Responses25
Semantic Edge Detection with Diverse Deep Supervision25
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring25
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling24
Predictive Display for Teleoperation Based on Vector Fields Using Lidar-Camera Fusion24
Neural Architecture Search for Dense Prediction Tasks in Computer Vision24
A Region-Based Randers Geodesic Approach for Image Segmentation24
Blur Invariants for Image Recognition24
Robust Image Restoration with an Adaptive Huber Function Based Fidelity23
Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network23
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight23
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering23
Knowledge Distillation Meets Open-Set Semi-supervised Learning23
Correction: Continual Face Forgery Detection via Historical Distribution Preserving23
Out-of-Distribution Detection with Virtual Outlier Smoothing23
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue22
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection22
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models22
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer22
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer22
Anti-Bandit for Neural Architecture Search22
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook22
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation21
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing21
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification21
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding21
Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations21
SMPL-IKS: A Mixed Analytical-Neural Inverse Kinematics Solver for 3D Human Mesh Recovery21
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification21
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning21
Zero-Shot Learning on 3D Point Cloud Objects and Beyond20
A Deeper Analysis of Volumetric Relightable Faces20
Correction to: AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach20
Correction: Variational Rectification Inference for Learning with Noisy Labels20
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation19
Leveraging Blur Information for Plenoptic Camera Calibration19
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection19
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection19
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing19
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification19
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection19
Defending Against Adversarial Examples Via Modeling Adversarial Noise19
Sentimental Visual Captioning using Multimodal Transformer19
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)19
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast18
Learning 3D Semantic Scene Graphs with Instance Embeddings18
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection18
Generalized Relative Pose and Scale from Affine Correspondences18
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection18
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos18
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement18
Segment Anything in 3D with Radiance Fields18
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation18
Single-View View Synthesis with Self-rectified Pseudo-Stereo17
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding17
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning17
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization17
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation17
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention17
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks17
Task Bias in Contrastive Vision-Language Models17
AutoScale: Learning to Scale for Crowd Counting17
Image-Based Virtual Try-On: A Survey17
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models16
Incremental Model Enhancement via Memory-based Contrastive Learning16
Thread Counting in Plain Weave for Old Paintings Using Regression Deep Learning Models16
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation16
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors16
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey16
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective16
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence16
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark16
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs16
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes16
Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy16
Visual Object Tracking in First Person Vision16
Learning Sequence Representations by Non-local Recurrent Neural Memory15
FusionBooster: A Unified Image Fusion Boosting Paradigm15
Correction: Open-Vocabulary Text-Driven Human Image Generation15
Universal Prototype Transport for Zero-Shot Action Recognition and Localization15
Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection14
Attribute-Centric Compositional Text-to-Image Generation14
Diagnosing Human-Object Interaction Detectors14
Semantic Contrastive Embedding for Generalized Zero-Shot Learning14
Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation14
Deep Learning-Based Image and Video Inpainting: A Survey14
Audio-Visual Segmentation with Semantics14
Transformer for Object Re-identification: A Survey14
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation14
Guest Editorial: Special Issue on Biometrics Security and Privacy14
Animal-CLIP: A Dual-Prompt Enhanced Vision-Language Model for Animal Action Recognition14
Action2video: Generating Videos of Human 3D Actions14
Position-Guided Point Cloud Panoptic Segmentation Transformer14
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation14
Multi-Constraint Transferable Generative Adversarial Networks for Cross-Modal Brain Image Synthesis14
Self-supervised Scalable Deep Compressed Sensing14
Deep Memory-Augmented Proximal Unrolling Network for Compressive Sensing14
Bayes-CAL: Robust Cross-Modal Alignment by Bayesian Approach for Few-Shot OoD Generalization14
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition13
SoftPool++: An Encoder–Decoder Network for Point Cloud Completion13
Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization13
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks13
Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups Using a Single Model Across Cages13
Towards Frame Rate Agnostic Multi-object Tracking13
Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search13
Compositional Prompting for Anti-Forgetting in Domain Incremental Learning13
Domain-Agnostic Priors for Semantic Segmentation Under Unsupervised Domain Adaptation and Domain Generalization13
A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion13
Mitigating Knowledge Discrepancies among Multiple Datasets for Task-agnostic Unified Face Alignment13
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation13
Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes13
Open-Vocabulary Text-Driven Human Image Generation13
Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors13
Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation13
VLPrompt-PSG: Vision-Language Prompting for Panoptic Scene Graph Generation13
Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering13
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement13
Warping the Residuals for Image Editing with StyleGAN13
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training12
Learning Portrait Drawing with Unsupervised Parts12
What Do Visual Models Look At? Dilated Attention for Targeted Transferable Attacks12
Real-Time Neural Radiance Talking Portrait Synthesis via Audio-Spatial Decomposition12
Single Pixel Spectral Color Constancy12
Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints12
Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging12
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models12
0.098077058792114