International Journal of Computer Vision

Papers
(The TQCC of International Journal of Computer Vision is 11. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-06-01 to 2025-06-01.)
ArticleCitations
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression1260
A Minimal Solution for Image-Based Sphere Estimation1082
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks1063
Guest Editorial: Special Issue on Open-World Visual Recognition398
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation357
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement351
Instance-Aware Scene Layout Forecasting243
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics215
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective210
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration186
Correction: Multi-source-free Domain Adaptive Object Detection161
Learning Text-to-Video Retrieval from Image Captioning150
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels140
Learning Discriminative Features for Visual Tracking via Scenario Decoupling134
MoDA: Modeling Deformable 3D Objects from Casual Videos128
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence127
Learning with Enriched Inductive Biases for Vision-Language Models120
Image Synthesis Under Limited Data: A Survey and Taxonomy111
Are Vision Transformers Robust to Spurious Correlations?107
Conditional Temporal Variational AutoEncoder for Action Video Prediction107
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates105
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion104
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition103
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting97
Instance-dependent Label Distribution Estimation for Learning with Label Noise96
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements96
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention88
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach88
EAN: Event Adaptive Network for Enhanced Action Recognition88
Delving Deeper into Anti-Aliasing in ConvNets87
Deep Image Deblurring: A Survey85
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision84
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild83
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models81
Noise-Resistant Multimodal Transformer for Emotion Recognition75
Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation69
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization66
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking66
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data64
Lightweight and Progressively-Scalable Networks for Semantic Segmentation64
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer64
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation59
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation58
A Realism Metric for Generated LiDAR Point Clouds57
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking57
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks57
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation53
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset51
Guest Editorial: Special Issue on the British Machine Vision Conference 202250
Bi-calibration Networks for Weakly-Supervised Video Representation Learning50
Learning Cooperative Neural Modules for Stylized Image Captioning50
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging48
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions47
VideoQA in the Era of LLMs: An Empirical Study46
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning43
Correction: Consistent Prompt Tuning for Generalized Category Discovery42
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose41
Semantic-Based Implicit Feature Transform for Few-Shot Classification41
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning40
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition40
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution38
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow38
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization37
Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited37
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond37
Diagram Perception Networks for Textbook Question Answering via Joint Optimization37
Learning to Prompt for Vision-Language Models37
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow37
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification37
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces35
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model35
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation34
Image Matting and 3D Reconstruction in One Loop34
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking34
Improving Domain Adaptation Through Class Aware Frequency Transformation34
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification34
Beyond Learned Metadata-Based Raw Image Reconstruction34
Advances in 3D Neural Stylization: A Survey34
Understanding Synonymous Referring Expressions via Contrastive Features34
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network33
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset33
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining32
Paragraph-to-Image Generation with Information-Enriched Diffusion Model32
Globally Correlation-Aware Hard Negative Generation31
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method31
A Generalized Contour Vibration Model for Building Extraction30
RePCD-Net: Feature-Aware Recurrent Point Cloud Denoising Network30
A CNN Based Approach for the Point-Light Photometric Stereo Problem30
Modeling Scattering Effect for Under-Display Camera Image Restoration30
Robust Unpaired Image Dehazing via Density and Depth Decomposition29
Generative Adversarial Network Applications in Industry 4.0: A Review29
Deep Maximum a Posterior Estimator for Video Denoising29
Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild28
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion28
Structured Binary Neural Networks for Image Recognition28
Guest Editorial: Special Issue on Computer Vision from 2D to 3D27
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection27
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing27
Assignment Flow for Order-Constrained OCT Segmentation26
PartCom: Part Composition Learning for 3D Open-Set Recognition26
Blur Invariants for Image Recognition26
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing26
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching25
An Optimal Transport View of Class-Imbalanced Visual Recognition25
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation25
Investigating Self-Supervised Methods for Label-Efficient Learning25
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring25
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition25
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models25
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance25
Shuffled Linear Regression with Outliers in Both Covariates and Responses24
Distribution-Aware Margin Calibration for Semantic Segmentation in Images24
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection24
Semantic Edge Detection with Diverse Deep Supervision23
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion23
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation23
Singularity Analysis for the Perspective-Four and Five-Line Problems23
SHARP: Shape-Aware Reconstruction of People in Loose Clothing23
Deep Learning Geometry Compression Artifacts Removal for Video-Based Point Cloud Compression23
Active Perception for Visual-Language Navigation22
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling22
Source-Free Domain Adaptation via Target Prediction Distribution Searching22
A Region-Based Randers Geodesic Approach for Image Segmentation22
Neural Architecture Search for Dense Prediction Tasks in Computer Vision22
Countering Malicious DeepFakes: Survey, Battleground, and Horizon22
LEO: Generative Latent Image Animator for Human Video Synthesis22
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer21
Knowledge Distillation Meets Open-Set Semi-supervised Learning21
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook21
Robust Image Restoration with an Adaptive Huber Function Based Fidelity21
Correction: Continual Face Forgery Detection via Historical Distribution Preserving21
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight21
Anti-Bandit for Neural Architecture Search21
Guest Editorial: Special Issue: Computer Vision and Pattern Recognition (DAGM GCPR 2019)21
Out-of-Distribution Detection with Virtual Outlier Smoothing21
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing20
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering20
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification20
Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network20
Zero-Shot Learning on 3D Point Cloud Objects and Beyond19
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue19
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection19
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning19
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding19
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models19
Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations19
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer19
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation18
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation18
Correction to: AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach18
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification18
Correction: Variational Rectification Inference for Learning with Noisy Labels18
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection17
Image-Based Virtual Try-On: A Survey17
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection17
A Deeper Analysis of Volumetric Relightable Faces17
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection17
Learning 3D Semantic Scene Graphs with Instance Embeddings17
Single-View View Synthesis with Self-rectified Pseudo-Stereo17
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)17
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning17
Segment Anything in 3D with Radiance Fields16
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification16
Leveraging Blur Information for Plenoptic Camera Calibration16
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos16
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast16
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement16
Defending Against Adversarial Examples Via Modeling Adversarial Noise16
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation16
Sentimental Visual Captioning using Multimodal Transformer16
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection16
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing16
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding15
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks15
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models15
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation15
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization15
Visual Object Tracking in First Person Vision15
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection15
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence15
Incremental Model Enhancement via Memory-based Contrastive Learning15
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective15
Task Bias in Contrastive Vision-Language Models15
AutoScale: Learning to Scale for Crowd Counting15
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation15
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors14
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention14
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs14
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark14
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey14
Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy14
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes14
Attribute-Centric Compositional Text-to-Image Generation14
Self-supervised Scalable Deep Compressed Sensing13
Position-Guided Point Cloud Panoptic Segmentation Transformer13
Universal Prototype Transport for Zero-Shot Action Recognition and Localization13
Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering13
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks13
Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization13
A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion13
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation13
Deep Learning-Based Image and Video Inpainting: A Survey13
Correction: Open-Vocabulary Text-Driven Human Image Generation13
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation13
Guest Editorial: Special Issue on Biometrics Security and Privacy13
Transformer for Object Re-identification: A Survey13
Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation13
FusionBooster: A Unified Image Fusion Boosting Paradigm13
Multi-Constraint Transferable Generative Adversarial Networks for Cross-Modal Brain Image Synthesis13
Diagnosing Human-Object Interaction Detectors13
Learning Sequence Representations by Non-local Recurrent Neural Memory13
Action2video: Generating Videos of Human 3D Actions13
Deep Memory-Augmented Proximal Unrolling Network for Compressive Sensing13
Semantic Contrastive Embedding for Generalized Zero-Shot Learning13
Audio-Visual Segmentation with Semantics13
Fast Ultra High-Definition Video Deblurring via Multi-scale Separable Network12
Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes12
Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding12
DLOW: Domain Flow and Applications12
Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation12
Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints12
Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors12
Towards Frame Rate Agnostic Multi-object Tracking12
Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing12
Open-Vocabulary Text-Driven Human Image Generation12
Compositional Prompting for Anti-Forgetting in Domain Incremental Learning12
Domain-Agnostic Priors for Semantic Segmentation Under Unsupervised Domain Adaptation and Domain Generalization12
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training12
Predicting Visual Political Bias Using Webly Supervised Data and an Auxiliary Task12
Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging12
Universal Representations: A Unified Look at Multiple Task and Domain Learning12
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation12
Multi-Text Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model12
SoftPool++: An Encoder–Decoder Network for Point Cloud Completion12
Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups Using a Single Model Across Cages12
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition11
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement11
A Survey on Long-Tailed Visual Recognition11
Multi-view Consistent Generative Adversarial Networks for Compositional 3D-Aware Image Synthesis11
Deep Hierarchical Learning for 3D Semantic Segmentation11
Warping the Residuals for Image Editing with StyleGAN11
Dual Graph Networks for Pose Estimation in Crowded Scenes11
Unknown Support Prototype Set for Open Set Recognition11
Learning Regression and Verification Networks for Robust Long-term Tracking11
Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search11
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models11
Instance-Level Moving Object Segmentation from a Single Image with Events11
Semi-Supervised Domain Generalization with Stochastic StyleMatch11
0.1070671081543