International Journal of Computer Vision

Papers
(The TQCC of International Journal of Computer Vision is 14. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression2060
Instance-Aware Scene Layout Forecasting646
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective450
Guest Editorial: Special Issue on Open-World Visual Recognition355
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution291
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements286
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion274
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach259
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks232
Correction: Multi-source-free Domain Adaptive Object Detection188
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation181
Learning Discriminative Features for Visual Tracking via Scenario Decoupling170
Are Vision Transformers Robust to Spurious Correlations?159
Image Synthesis Under Limited Data: A Survey and Taxonomy158
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels146
MoDA: Modeling Deformable 3D Objects from Casual Videos141
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence140
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration140
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement134
A Minimal Solution for Image-Based Sphere Estimation128
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting128
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates112
Instance-dependent Label Distribution Estimation for Learning with Label Noise111
Learning with Enriched Inductive Biases for Vision-Language Models109
Conditional Temporal Variational AutoEncoder for Action Video Prediction109
Delving Deeper into Anti-Aliasing in ConvNets103
EAN: Event Adaptive Network for Enhanced Action Recognition102
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention100
Learning Text-to-Video Retrieval from Image Captioning99
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision99
Image-based Morphological Characterization of Filamentous Biological Structures with Non-constant Curvature Shape Feature98
Deep Image Deblurring: A Survey97
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition92
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging88
Guest Editorial: Special Issue on the British Machine Vision Conference 202286
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks85
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization84
Feature Hallucination for Self-supervised Action Recognition83
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning81
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild81
Semantic-Based Implicit Feature Transform for Few-Shot Classification76
A Realism Metric for Generated LiDAR Point Clouds74
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose74
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset72
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution71
UIL-AQA: Uncertainty-Aware Clip-Level Interpretable Action Quality Assessment70
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions69
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking68
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning67
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization67
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data65
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer64
Correction: Consistent Prompt Tuning for Generalized Category Discovery63
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation62
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models60
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation59
Bi-calibration Networks for Weakly-Supervised Video Representation Learning57
Learning Cooperative Neural Modules for Stylized Image Captioning56
Noise-Resistant Multimodal Transformer for Emotion Recognition55
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow55
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking54
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation53
Learning Latent Part-Whole Hierarchies for Point Clouds53
VideoQA in the Era of LLMs: An Empirical Study53
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond53
Diagram Perception Networks for Textbook Question Answering via Joint Optimization51
Lightweight and Progressively-Scalable Networks for Semantic Segmentation51
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow49
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition49
Learning to Prompt for Vision-Language Models49
Image Matting and 3D Reconstruction in One Loop48
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method47
Modeling Scattering Effect for Under-Display Camera Image Restoration46
A Generalized Contour Vibration Model for Building Extraction46
Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited46
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network45
Beyond Learned Metadata-Based Raw Image Reconstruction45
A CNN Based Approach for the Point-Light Photometric Stereo Problem44
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking43
Control Color: Multimodal Diffusion-Based Interactive Image Colorization42
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification41
Correction: BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos40
Understanding Synonymous Referring Expressions via Contrastive Features40
RePCD-Net: Feature-Aware Recurrent Point Cloud Denoising Network40
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces40
Globally Correlation-Aware Hard Negative Generation39
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification38
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model37
Paragraph-to-Image Generation with Information-Enriched Diffusion Model37
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining37
Generative Adversarial Network Applications in Industry 4.0: A Review37
Robust Unpaired Image Dehazing via Density and Depth Decomposition36
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation36
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion36
From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave36
Focal Modulation for Image Restoration36
Improving Domain Adaptation Through Class Aware Frequency Transformation36
Advances in 3D Neural Stylization: A Survey35
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection35
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset35
Guest Editorial: Special Issue on Computer Vision from 2D to 3D34
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance33
Structured Binary Neural Networks for Image Recognition33
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing33
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing33
Investigating Self-Supervised Methods for Label-Efficient Learning33
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation32
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching32
Singularity Analysis for the Perspective-Four and Five-Line Problems31
An Optimal Transport View of Class-Imbalanced Visual Recognition31
Beyond Image Prior: Embedding Noise Prior into Latent Space of Conditional Denoising Transformer31
Shuffled Linear Regression with Outliers in Both Covariates and Responses30
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation30
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring30
Active Perception for Visual-Language Navigation30
Uncertainty-Aware and Decoupled Distillation for Semantic Segmentation30
Blur Invariants for Image Recognition30
Predictive Display for Teleoperation Based on Vector Fields Using Lidar-Camera Fusion29
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition29
Countering Malicious DeepFakes: Survey, Battleground, and Horizon29
PartCom: Part Composition Learning for 3D Open-Set Recognition29
TokenPacker: Efficient Visual Projector for Multimodal LLM29
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion29
SHARP: Shape-Aware Reconstruction of People in Loose Clothing29
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection29
A Region-Based Randers Geodesic Approach for Image Segmentation29
LEO: Generative Latent Image Animator for Human Video Synthesis29
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models28
Out-of-Distribution Detection with Virtual Outlier Smoothing28
Source-Free Domain Adaptation via Target Prediction Distribution Searching28
Anti-Bandit for Neural Architecture Search28
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling28
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight28
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models27
SMPL-IKS: A Mixed Analytical-Neural Inverse Kinematics Solver for 3D Human Mesh Recovery27
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering27
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer27
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing27
Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network27
Correction: Continual Face Forgery Detection via Historical Distribution Preserving26
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification26
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection26
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning26
Robust Image Restoration with an Adaptive Huber Function Based Fidelity26
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing25
Neural Architecture Search for Dense Prediction Tasks in Computer Vision25
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer25
Correction to: AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach24
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue24
Subspace Training Mitigates Gradient Noise Vulnerability24
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding24
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation24
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)24
Knowledge Distillation Meets Open-Set Semi-supervised Learning24
Guest Editorial: Special Issue on Visual Datasets24
Correction: Variational Rectification Inference for Learning with Noisy Labels24
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection24
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook24
Zero-Shot Learning on 3D Point Cloud Objects and Beyond24
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification24
Sentimental Visual Captioning using Multimodal Transformer23
Learning 3D Semantic Scene Graphs with Instance Embeddings23
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation23
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection23
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos23
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification23
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection22
Generalized Relative Pose and Scale from Affine Correspondences22
Leveraging Blur Information for Plenoptic Camera Calibration22
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast22
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation22
Defending Against Adversarial Examples Via Modeling Adversarial Noise21
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection21
Segment Anything in 3D with Radiance Fields21
Single-View View Synthesis with Self-rectified Pseudo-Stereo21
Image-Based Virtual Try-On: A Survey21
FourierMIL: Fourier Filtering-based Multiple Instance Learning for Whole Slide Image Analysis21
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning21
A Deeper Analysis of Volumetric Relightable Faces21
AutoScale: Learning to Scale for Crowd Counting20
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks20
Task Bias in Contrastive Vision-Language Models20
Thread Counting in Plain Weave for Old Paintings Using Regression Deep Learning Models20
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection20
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement20
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization20
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing20
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding20
Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy20
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation19
Incremental Model Enhancement via Memory-based Contrastive Learning19
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes19
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models19
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence19
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention19
Visual Object Tracking in First Person Vision19
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors19
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey19
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark19
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation19
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective19
Correction: Open-Vocabulary Text-Driven Human Image Generation18
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation18
Bayes-CAL: Robust Cross-Modal Alignment by Bayesian Approach for Few-Shot OoD Generalization18
Learning Sequence Representations by Non-local Recurrent Neural Memory18
Universal Prototype Transport for Zero-Shot Action Recognition and Localization18
Animal-CLIP: A Dual-Prompt Enhanced Vision-Language Model for Animal Action Recognition18
Guest Editorial: Special Issue on Biometrics Security and Privacy18
Semantic Contrastive Embedding for Generalized Zero-Shot Learning18
Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation17
Mitigating Knowledge Discrepancies among Multiple Datasets for Task-agnostic Unified Face Alignment17
Audio-Visual Segmentation with Semantics17
Deep Memory-Augmented Proximal Unrolling Network for Compressive Sensing17
Action2video: Generating Videos of Human 3D Actions17
Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering17
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks17
VLPrompt-PSG: Vision-Language Prompting for Panoptic Scene Graph Generation17
FusionBooster: A Unified Image Fusion Boosting Paradigm16
Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization16
Deep Learning-Based Image and Video Inpainting: A Survey16
Attribute-Centric Compositional Text-to-Image Generation16
Transformer for Object Re-identification: A Survey16
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation16
Multi-Constraint Transferable Generative Adversarial Networks for Cross-Modal Brain Image Synthesis16
Diagnosing Human-Object Interaction Detectors16
Domain-Agnostic Priors for Semantic Segmentation Under Unsupervised Domain Adaptation and Domain Generalization16
Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection16
Self-supervised Scalable Deep Compressed Sensing16
A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion16
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs16
Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors16
Position-Guided Point Cloud Panoptic Segmentation Transformer16
Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing15
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models15
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement15
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training15
Fast Ultra High-Definition Video Deblurring via Multi-scale Separable Network15
Warping the Residuals for Image Editing with StyleGAN15
A Survey on Long-Tailed Visual Recognition15
Open-Vocabulary Text-Driven Human Image Generation15
Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints15
Real-Time Neural Radiance Talking Portrait Synthesis via Audio-Spatial Decomposition15
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation15
Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes15
Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs15
SoftPool++: An Encoder–Decoder Network for Point Cloud Completion15
Multi-Text Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model15
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition15
Towards Frame Rate Agnostic Multi-object Tracking15
Towards High-Resolution Specular Highlight Detection14
0.23205709457397