International Journal of Computer Vision

Papers
(The TQCC of International Journal of Computer Vision is 10. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Learning with Enriched Inductive Biases for Vision-Language Models1107
Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation1013
Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective994
Image Synthesis Under Limited Data: A Survey and Taxonomy360
Dual-Space Video Person Re-identification318
LMD: Light-Weight Prediction Quality Estimation for Object Detection in Lidar Point Clouds294
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation212
Robust Deep Object Tracking against Adversarial Attacks203
Lidar Panoptic Segmentation in an Open World197
Breaking the Limits of Reliable Prediction via Generated Data193
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks177
RMS-FlowNet++: Efficient and Robust Multi-scale Scene Flow Estimation for Large-Scale Point Clouds167
Temporal Transductive Inference for Few-Shot Video Object Segmentation147
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation138
Editor’s Note: Special Issue on BMVC 2021127
Computer Vision and Pattern Recognition 2020120
Deep CockTail Networks118
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression115
Editor’s Note: Special Issue on Computer Vision and Cultural Heritage Preservation102
Towards Boosting Out-of-Distribution Detection from a Spatial Feature Importance Perspective100
Guest Editorial: Special Issue on Open-World Visual Recognition98
An Efficient Model for a Camera Behind a Parallel Refractive Slab96
Guest Editorial: Special Issue on Performance Evaluation in Computer Vision95
Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network91
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement90
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization90
LDTrack: Dynamic People Tracking by Service Robots Using Diffusion Models89
HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning87
LR-ASD: Lightweight and Robust Network for Active Speaker Detection85
End-to-End Video Text Spotting with Transformer82
Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction80
Correction: Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences Using Transformer Networks78
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention78
Renormalization for Initialization of Rolling Shutter Visual-Inertial Odometry77
CMSNet: Deep Color and Monochrome Stereo72
Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks71
Importance First: Generating Scene Graph of Human Interest71
4D Temporally Coherent Multi-Person Semantic Reconstruction and Segmentation69
Building 3D Generative Models from Minimal Data69
Cross-Domain Gated Learning for Domain Generalization63
Dual-Attention-Guided Network for Ghost-Free High Dynamic Range Imaging63
Adapting Across Domains via Target-Oriented Transferable Semantic Augmentation Under Prototype Constraint60
A Cutting-Plane Method for Sublabel-Accurate Relaxation of Problems with Product Label Spaces59
Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation58
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval56
Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search55
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates55
RELAX: Representation Learning Explainability54
Correction to: Deep Unpaired Blind Image Super-Resolution Using Self-supervised Learning and Exemplar Distillation54
CSDG-FAS: Closed-Space Domain Generalization for Face Anti-spoofing50
Spectral Shape Recovery and Analysis Via Data-driven Connections50
FD-GAN: Generalizable and Robust Forgery Detection via Generative Adversarial Networks49
Delving into Inter-Image Invariance for Unsupervised Visual Representations49
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting47
Fast and Accurate 3D Registration from Line Intersection Constraints46
Disentangling Geometric Deformation Spaces in Generative Latent Shape Models46
Task Bias in Contrastive Vision-Language Models45
Learning Dynamic Prototypes for Visual Pattern Debiasing44
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements43
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels43
A Survey of Methods for Automated Quality Control Based on Images42
Automatic Modelling for Interactive Action Assessment42
Mitigating Demographic Bias in Facial Datasets with Style-Based Multi-attribute Transfer42
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis41
Deep Corner40
Super Vision Transformer39
 WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake Detection39
Semantic-Aware Visual Decomposition for Image Coding38
A Minimal Solution for Image-Based Sphere Estimation38
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence37
Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning35
ToTem NRSfM: Object-Wise Non-rigid Structure-from-Motion with a Topological Template35
Incremental Model Enhancement via Memory-based Contrastive Learning34
Combating Label Noise with a General Surrogate Model for Sample Selection34
Instance-Aware Scene Layout Forecasting34
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective33
CAE-GReaT: Convolutional-Auxiliary Efficient Graph Reasoning Transformer for Dense Image Predictions33
M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection32
Efficient High-Quality Vectorized Modeling of Large-Scale Scenes32
PL$${}_{1}$$P: Point-Line Minimal Problems under Partial Visibility in Three Views32
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition31
InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation31
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking31
Learning Discriminative Features for Visual Tracking via Scenario Decoupling30
MoDA: Modeling Deformable 3D Objects from Casual Videos30
Dynamical Deep Generative Latent Modeling of 3D Skeletal Motion29
Toward Accurate and Robust Pedestrian Detection via Variational Inference29
DOVE: Learning Deformable 3D Objects by Watching Videos29
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective29
Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation28
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration28
Recurrent Graph Neural Networks for Video Instance Segmentation28
Are Vision Transformers Robust to Spurious Correlations?27
Instance-dependent Label Distribution Estimation for Learning with Label Noise27
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence27
CG-FAS: Cross-label Generative Augmentation for Face Anti-Spoofing26
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition26
Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations26
Learning to Detect Novel Species with SAM in the Wild26
Deep Attention Learning for Pre-operative Lymph Node Metastasis Prediction in Pancreatic Cancer via Multi-object Relationship Modeling26
Local Compressed Video Stream Learning for Generic Event Boundary Detection26
Polysemy Deciphering Network for Robust Human–Object Interaction Detection26
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics25
Re-ID-leak: Membership Inference Attacks Against Person Re-identification25
Language-Guided Hierarchical Fine-Grained Image Forgery Detection and Localization25
On Finite Difference Jacobian Computation in Deformable Image  Registration24
From Individual to Whole: Reducing Intra-class Variance by Feature Aggregation24
The Isowarp: The Template-Based Visual Geometry of Isometric Surfaces24
Network Adjustment: Channel and Block Search Guided by Resource Utilization Ratio24
UrbanEvolver: Function-Aware Urban Layout Regeneration23
Self-supervised Secondary Landmark Detection via 3D Representation Learning23
A Survey on Adaptive Cameras23
Inferring Attention Shifts for Salient Instance Ranking23
An Empirical Study on Multi-domain Robust Semantic Segmentation22
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding22
Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation22
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models22
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach22
Correction: Instant3D: Instant Text-to-3D Generation22
PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection22
Guest Editorial: Special Issue on Multimodal Learning22
Eliminating Temporal Illumination Variations in Whisk-broom Hyperspectral Imaging21
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation21
Expressive Image Generation and Editing with Rich Text21
Exemplar-Free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation21
Artificial Intelligence for Dunhuang Cultural Heritage Protection: The Project and the Dataset21
Visual Object Tracking in First Person Vision21
Conditional Temporal Variational AutoEncoder for Action Video Prediction20
Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior20
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking20
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering19
FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms19
RIConv++: Effective Rotation Invariant Convolutions for 3D Point Clouds Deep Learning19
Preconditioned Score-Based Generative Models19
SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers19
Camouflaged Object Detection with Adaptive Partition and Background Retrieval19
Deep Unfolding for Snapshot Compressive Imaging19
Blind Image Quality Assessment: Exploring Content Fidelity Perceptibility via Quality Adversarial Learning19
Focus for Free in Density-Based Counting19
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey18
Invertible Rescaling Network and Its Extensions18
MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation18
ScenarioDiff: Text-to-video Generation with Dynamic Transformations of Scene Conditions18
Underwater Camera: Improving Visual Perception Via Adaptive Dark Pixel Prior and Color Correction18
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark18
Contextual Object Detection with Multimodal Large Language Models18
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision18
Curriculum Learning: A Survey17
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch17
Pyramid Attention Network for Image Restoration17
On the Generalization and Causal Explanation in Self-Supervised Learning17
Correction: Multi-source-free Domain Adaptive Object Detection17
Regional Adversarial Training for Better Robust Generalization17
Unsupervised Scale-Consistent Depth Learning from Video17
End-to-End Alternating Optimization for Real-World Blind Super Resolution17
Deep Image Deblurring: A Survey17
Learning Text-to-Video Retrieval from Image Captioning17
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-Wise Pseudo Labeling16
Interpreting Face Inference Models Using Hierarchical Network Dissection16
Delving Deeper into Anti-Aliasing in ConvNets16
Hierarchical Curriculum Learning for No-Reference Image Quality Assessment16
APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking16
Learning to Adapt to Light16
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors16
Attribute-Centric Compositional Text-to-Image Generation16
EAN: Event Adaptive Network for Enhanced Action Recognition16
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention16
Deep Physics-Guided Unrolling Generalization for Compressed Sensing16
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes16
Guest Editorial: Special Issue on Deep Learning for Video Analysis and Compression15
Guest Editorial: Special Issue on the British Machine Vision Conference 202215
Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data15
SportsCap: Monocular 3D Human Motion Capture and Fine-Grained Understanding in Challenging Sports Videos15
Editor’s Note: Special Issue on 3D Computer Vision15
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning15
Guest Editorial: Special Issue on Traditional Computer Vision in the Age of Deep Learning15
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation14
Guest Editorial: Special Issue on Advances in Computer Vision and Applications (ACCV 2020)14
Blind Image Deblurring with Unknown Kernel Size and Substantial Noise14
PointSea: Point Cloud Completion via Self-structure Augmentation14
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models14
LaMD: Latent Motion Diffusion for Image-Conditional Video Generation14
Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition14
In Search of Lost Online Test-Time Adaptation: A Survey14
Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation14
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions14
Position-Guided Point Cloud Panoptic Segmentation Transformer13
Full-Spectrum Out-of-Distribution Detection13
FusionBooster: A Unified Image Fusion Boosting Paradigm13
DiffuVolume: Diffusion Model for Volume based Stereo Matching13
Bridging the Source-to-Target Gap for Cross-Domain Person Re-identification with Intermediate Domains13
Correction: Open-Vocabulary Text-Driven Human Image Generation13
Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking13
3D Shape Analysis Through a Quantum Lens: the Average Mixing Kernel Signature13
Self-supervised Shutter Unrolling with Events13
3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking13
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation13
FastTrack: A Highly Efficient and Generic GPU-Based Multi-object Tracking Method with Parallel Kalman Filter13
When Multi-Focus Image Fusion Networks Meet Traditional Edge-Preservation Technology13
Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification13
Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection13
Learning to Detect Instance-Level Salient Objects Using Complementary Image Labels12
A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion12
VideoQA in the Era of LLMs: An Empirical Study12
Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset12
Learning by Asking Questions for Knowledge-Based Novel Object Recognition12
Overcoming the Domain Gap in Neural Action Representations12
Pyramid NeRF: Frequency Guided Fast Radiance Field Optimization12
Open-Set Adversarial Defense with Clean-Adversarial Mutual Learning12
Correction: HCLR-Net: Hybrid Contrastive Learning Regularization with Locally Randomized Perturbation for Underwater Image Enhancement12
Memory-Augmented Deep Unfolding Network for Guided Image Super-resolution12
Imbalance-Aware Discriminative Clustering for Unsupervised Semantic Segmentation12
Augmenting the Softmax with Additional Confidence Scores for Improved Selective Classification with Out-of-Distribution Data12
WATB: Wild Animal Tracking Benchmark12
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data12
Learning Cooperative Neural Modules for Stylized Image Captioning12
Data Augmentation for Low-Level Vision: CutBlur and Mixture-of-Augmentation12
Distribution-Sensitive Information Retention for Accurate Binary Neural Network12
AnimalTrack: A Benchmark for Multi-Animal Tracking in the Wild12
Depth Descent Synchronization in $${{\,\mathrm{\text {SO}}\,}}(D)$$12
Action2video: Generating Videos of Human 3D Actions12
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks12
Attribute-Image Person Re-identification via Modal-Consistent Metric Learning11
Learning Robust Facial Representation From the View of Diversity and Closeness11
3D Adversarial Augmentations for Robust Out-of-Domain Predictions11
Lightweight and Progressively-Scalable Networks for Semantic Segmentation11
Learning to Detect Semantic Boundaries with Image-Level Class Labels11
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs11
ReliTalk: Relightable Talking Portrait Generation from a Single Video11
Matching Compound Prototypes for Few-Shot Action Recognition11
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks11
Symmetry-aware Neural Architecture for Embodied Visual Navigation11
Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks11
HybridPrompt: Domain-Aware Prompting for Cross-Domain Few-Shot Learning10
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation10
Compressed Event Sensing (CES) Volumes for Event Cameras10
Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation10
Learning Adaptive Classifiers Synthesis for Generalized Few-Shot Learning10
Bipartite Graph Reasoning GANs for Person Pose and Facial Image Synthesis10
L3AM: Linear Adaptive Additive Angular Margin Loss for Video-Based Hand Gesture Authentication10
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization10
Descriptor Distillation: A Teacher-Student-Regularized Framework for Learning Local Descriptors10
Deep Learning-Based Image and Video Inpainting: A Survey10
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset10
UPR-Net: A Unified Pyramid Recurrent Network for Video Frame Interpolation10
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization10
Infrared Adversarial Patches with Learnable Shapes and Locations in the Physical World10
Universal Prototype Transport for Zero-Shot Action Recognition and Localization10
One-Shot Generative Domain Adaptation in 3D GANs10
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow10
0.13759589195251