International Journal of Computer Vision

Papers
(The TQCC of International Journal of Computer Vision is 11. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)
ArticleCitations
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression1422
A Minimal Solution for Image-Based Sphere Estimation1140
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks1128
Guest Editorial: Special Issue on Open-World Visual Recognition418
Instance-Aware Scene Layout Forecasting388
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective267
Image Synthesis Under Limited Data: A Survey and Taxonomy225
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics225
Instance-dependent Label Distribution Estimation for Learning with Label Noise198
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence188
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition158
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration148
Correction: Multi-source-free Domain Adaptive Object Detection145
Learning Text-to-Video Retrieval from Image Captioning135
Learning Discriminative Features for Visual Tracking via Scenario Decoupling134
MoDA: Modeling Deformable 3D Objects from Casual Videos125
Conditional Temporal Variational AutoEncoder for Action Video Prediction114
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting112
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation111
Learning with Enriched Inductive Biases for Vision-Language Models110
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements109
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion109
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach105
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution98
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement95
EAN: Event Adaptive Network for Enhanced Action Recognition93
Are Vision Transformers Robust to Spurious Correlations?92
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels90
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision90
Deep Image Deblurring: A Survey90
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates90
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention84
Delving Deeper into Anti-Aliasing in ConvNets80
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models79
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking76
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization69
Lightweight and Progressively-Scalable Networks for Semantic Segmentation66
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation64
VideoQA in the Era of LLMs: An Empirical Study63
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions62
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond62
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning60
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow57
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild57
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data54
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation53
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer53
A Realism Metric for Generated LiDAR Point Clouds50
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks47
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset46
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation46
Guest Editorial: Special Issue on the British Machine Vision Conference 202246
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging45
Bi-calibration Networks for Weakly-Supervised Video Representation Learning45
Noise-Resistant Multimodal Transformer for Emotion Recognition44
Semantic-Based Implicit Feature Transform for Few-Shot Classification43
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning43
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking43
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization42
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition42
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose42
Learning Latent Part-Whole Hierarchies for Point Clouds42
Learning Cooperative Neural Modules for Stylized Image Captioning41
Diagram Perception Networks for Textbook Question Answering via Joint Optimization41
Correction: Consistent Prompt Tuning for Generalized Category Discovery41
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow40
Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited39
Learning to Prompt for Vision-Language Models39
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution39
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method38
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification38
Image Matting and 3D Reconstruction in One Loop37
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation37
Advances in 3D Neural Stylization: A Survey37
Paragraph-to-Image Generation with Information-Enriched Diffusion Model36
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification36
Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild36
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces36
Beyond Learned Metadata-Based Raw Image Reconstruction35
RePCD-Net: Feature-Aware Recurrent Point Cloud Denoising Network35
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking34
Understanding Synonymous Referring Expressions via Contrastive Features34
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset34
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network33
Globally Correlation-Aware Hard Negative Generation32
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining32
Deep Maximum a Posterior Estimator for Video Denoising31
Modeling Scattering Effect for Under-Display Camera Image Restoration31
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model30
From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave30
Robust Unpaired Image Dehazing via Density and Depth Decomposition30
Improving Domain Adaptation Through Class Aware Frequency Transformation30
A CNN Based Approach for the Point-Light Photometric Stereo Problem30
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion29
A Generalized Contour Vibration Model for Building Extraction29
Structured Binary Neural Networks for Image Recognition28
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection28
Correction: BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos28
Generative Adversarial Network Applications in Industry 4.0: A Review28
Guest Editorial: Special Issue on Computer Vision from 2D to 3D28
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation27
PartCom: Part Composition Learning for 3D Open-Set Recognition27
Active Perception for Visual-Language Navigation27
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance27
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing27
A Region-Based Randers Geodesic Approach for Image Segmentation27
Assignment Flow for Order-Constrained OCT Segmentation27
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models26
TokenPacker: Efficient Visual Projector for Multimodal LLM26
Investigating Self-Supervised Methods for Label-Efficient Learning26
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing26
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring26
SHARP: Shape-Aware Reconstruction of People in Loose Clothing26
LEO: Generative Latent Image Animator for Human Video Synthesis26
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching25
Singularity Analysis for the Perspective-Four and Five-Line Problems25
An Optimal Transport View of Class-Imbalanced Visual Recognition25
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation25
Distribution-Aware Margin Calibration for Semantic Segmentation in Images25
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection24
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion24
Source-Free Domain Adaptation via Target Prediction Distribution Searching23
Shuffled Linear Regression with Outliers in Both Covariates and Responses23
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition23
Countering Malicious DeepFakes: Survey, Battleground, and Horizon23
Deep Learning Geometry Compression Artifacts Removal for Video-Based Point Cloud Compression22
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling22
Knowledge Distillation Meets Open-Set Semi-supervised Learning22
Semantic Edge Detection with Diverse Deep Supervision22
Neural Architecture Search for Dense Prediction Tasks in Computer Vision22
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification22
Blur Invariants for Image Recognition22
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding22
Correction: Continual Face Forgery Detection via Historical Distribution Preserving22
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight21
Out-of-Distribution Detection with Virtual Outlier Smoothing21
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering21
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation21
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification21
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection21
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models21
Robust Image Restoration with an Adaptive Huber Function Based Fidelity21
Zero-Shot Learning on 3D Point Cloud Objects and Beyond21
Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network21
Guest Editorial: Special Issue: Computer Vision and Pattern Recognition (DAGM GCPR 2019)21
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook20
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing20
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer20
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer20
Correction: Variational Rectification Inference for Learning with Noisy Labels19
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue19
Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations19
Anti-Bandit for Neural Architecture Search19
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning19
A Deeper Analysis of Volumetric Relightable Faces18
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation18
Image-Based Virtual Try-On: A Survey18
Correction to: AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach18
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection18
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation18
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)18
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning18
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification18
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection18
Leveraging Blur Information for Plenoptic Camera Calibration17
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection17
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast17
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos17
Defending Against Adversarial Examples Via Modeling Adversarial Noise17
Sentimental Visual Captioning using Multimodal Transformer17
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks17
Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy16
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation16
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement16
Segment Anything in 3D with Radiance Fields16
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding16
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence16
Thread Counting in Plain Weave for Old Paintings Using Regression Deep Learning Models16
AutoScale: Learning to Scale for Crowd Counting16
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection16
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing16
Generalized Relative Pose and Scale from Affine Correspondences16
Visual Object Tracking in First Person Vision16
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization16
Task Bias in Contrastive Vision-Language Models16
Learning 3D Semantic Scene Graphs with Instance Embeddings16
Single-View View Synthesis with Self-rectified Pseudo-Stereo16
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection16
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation16
Learning Sequence Representations by Non-local Recurrent Neural Memory15
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention15
Incremental Model Enhancement via Memory-based Contrastive Learning15
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models15
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs15
Correction: Open-Vocabulary Text-Driven Human Image Generation15
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors15
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes15
Universal Prototype Transport for Zero-Shot Action Recognition and Localization15
FusionBooster: A Unified Image Fusion Boosting Paradigm15
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey15
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark15
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective15
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks14
Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection14
Animal-CLIP: A Dual-Prompt Enhanced Vision-Language Model for Animal Action Recognition14
Action2video: Generating Videos of Human 3D Actions14
Bayes-CAL: Robust Cross-Modal Alignment by Bayesian Approach for Few-Shot OoD Generalization14
A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion14
Self-supervised Scalable Deep Compressed Sensing14
Diagnosing Human-Object Interaction Detectors14
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation14
Position-Guided Point Cloud Panoptic Segmentation Transformer14
Transformer for Object Re-identification: A Survey14
Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation14
Multi-Constraint Transferable Generative Adversarial Networks for Cross-Modal Brain Image Synthesis14
SoftPool++: An Encoder–Decoder Network for Point Cloud Completion13
Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors13
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models13
Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search13
Audio-Visual Segmentation with Semantics13
Deep Memory-Augmented Proximal Unrolling Network for Compressive Sensing13
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation13
Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering13
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training13
Predicting Visual Political Bias Using Webly Supervised Data and an Auxiliary Task13
Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding13
Deep Learning-Based Image and Video Inpainting: A Survey13
Semantic Contrastive Embedding for Generalized Zero-Shot Learning13
Guest Editorial: Special Issue on Biometrics Security and Privacy13
Domain-Agnostic Priors for Semantic Segmentation Under Unsupervised Domain Adaptation and Domain Generalization13
Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups Using a Single Model Across Cages13
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation13
Warping the Residuals for Image Editing with StyleGAN13
Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization13
Attribute-Centric Compositional Text-to-Image Generation13
Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation13
Multi-Text Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model12
Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints12
Real-Time Neural Radiance Talking Portrait Synthesis via Audio-Spatial Decomposition12
Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing12
Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes12
Universal Representations: A Unified Look at Multiple Task and Domain Learning12
Open-Vocabulary Text-Driven Human Image Generation12
DLOW: Domain Flow and Applications12
Fast Ultra High-Definition Video Deblurring via Multi-scale Separable Network12
Dual Graph Networks for Pose Estimation in Crowded Scenes12
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition12
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement12
Compositional Prompting for Anti-Forgetting in Domain Incremental Learning12
Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging12
0.061738967895508