Computer Vision and Image Understanding

Papers
(The TQCC of Computer Vision and Image Understanding is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Semantic-preserved point-based human avatar243
Rebalanced supervised contrastive learning with prototypes for long-tailed visual recognition216
Fake News Detection Based on BERT Multi-domain and Multi-modal Fusion Network208
Incremental few-shot instance segmentation via feature enhancement and prototype calibration191
A multi-modal explainability approach for human-aware robots in multi-party conversation128
Luminance prior guided Low-Light 4C catenary image enhancement96
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection83
Target-aware and spatial-spectral discriminant feature joint correlation filters for hyperspectral video object tracking69
Seam estimation based on dense matching for parallax-tolerant image stitching67
Dual stage semantic information based generative adversarial network for image super-resolution66
Multi-Scale Adaptive Skeleton Transformer for action recognition61
Exploring using jigsaw puzzles for out-of-distribution detection59
PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning55
Spatial attention for human-centric visual understanding: An Information Bottleneck method47
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution46
Deep video compression based on Long-range Temporal Context Learning43
Static graph convolution with learned temporal and channel-wise graph topology generation for skeleton-based action recognition41
RFCNet: Enhancing urban segmentation using regularization, fusion, and completion32
3D scene generation for zero-shot learning using ChatGPT guided language prompts32
An egocentric video and eye-tracking dataset for visual search in convenience stores30
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving29
Local optimization cropping and boundary enhancement for end-to-end weakly-supervised segmentation network28
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation28
Trimap-guided feature mining and fusion network for natural image matting26
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification26
Cost-free adversarial defense: Distance-based optimization for model robustness without adversarial training24
LOFReg: An outlier-based regulariser for deep metric learning23
Efficient 6-DoF camera pose tracking with circular edges23
A novel fast combine-and-conquer object detector based on only one-level feature map22
De2Net: Under-22
Targeted adversarial attack on classic vision pipelines22
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection22
On the coherency of quantitative evaluation of visual explanations22
View consistency aware holistic triangulation for 3D human pose estimation21
Improved high dynamic range imaging using multi-scale feature flows balanced between task-orientedness and accuracy21
FedER: Federated Learning through Experience Replay and privacy-preserving data synthesis21
Weakly supervised action segmentation with effective use of attention and self-attention20
Siamese self-supervised learning for fine-grained visual classification20
GAMA: Geometric analysis based motion-aware architecture for moving object segmentation20
Feature fine-tuning and attribute representation transformation for zero-shot learning20
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective20
Discriminative object tracking by domain contrast19
Dehazing cost volume for deep multi-view stereo in scattering media with airlight and scattering coefficient estimation19
Learning to teach and learn for semi-supervised few-shot image classification19
Confidence sharing adaptation for out-of-domain human pose and shape estimation18
Emerging image generation with flexible control of perceived difficulty18
MetaVD: A Meta Video Dataset for enhancing human action recognition datasets18
Feature reconstruction and metric based network for few-shot object detection18
Estimating 3D body mesh without SMPL annotations via alternating successive convex approximation18
Precondition and effect reasoning for action recognition17
Towards explainable deep visual saliency models17
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation17
Lightweight feature point detection network with channel enhancement17
Efficient multi-stage network with pixel-wise degradation prediction for real-time motion deblurring17
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model16
Multi-view clustering with Laplacian rank constraint based on symmetric and nonnegative low-rank representation16
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow15
Online object tracking based interactive attention15
Editorial Board14
MATTE: Multi-task multi-scale attention14
Editorial Board14
Editorial Board14
Editorial Board14
Editorial Board13
Feature independent Filter Pruning by Successive Layers analysis13
Editorial Board13
Sparse coding with morphology segmentation and multi-label fusion for hyperspectral image super-resolution13
Domain generalized federated learning for Person Re-identification13
Certifiable algorithms for the two-view planar triangulation problem13
Streaming egocentric action anticipation: An evaluation scheme and approach13
Simplifying open-set video domain adaptation with contrastive learning13
Editorial Board13
Editorial Board13
Editorial Board12
Editorial Board12
Modality adaptation via feature difference learning for depth human parsing12
Editorial Board12
Grow-push-prune: Aligning deep discriminants for effective structural network compression12
Handling new target classes in semantic segmentation with domain adaptation12
Facial landmarks localization using cascaded neural networks12
Efficient cross-information fusion decoder for semantic segmentation11
Deducing health cues from biometric data11
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition11
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain11
FusionDiff: A unified image fusion network based on diffusion probabilistic models11
3DF-FCOS: Small object detection with 3D features based on FCOS10
Structural reasoning for image-based social relation recognition10
Editorial Board10
Adaptive semantic guidance network for video captioning10
Editorial Board10
Collaborative three-stream transformers for video captioning10
Editorial Board10
Multi-patch multi-scale model for motion deblurring with high-frequency information10
Image amodal completion: A survey10
Full-body virtual try-on using top and bottom garments with wearing style control10
Instance-level salient object segmentation10
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval9
AdvFAS: A robust face anti-spoofing framework against adversarial examples9
Robust real-world point cloud registration by inlier detection9
Joint image-instance spatial–temporal attention for few-shot action recognition9
CT-VOS: Cutout prediction and tagging for self-supervised video object segmentation9
Dual cross perception network with texture and boundary guidance for camouflaged object detection9
Frame-level refinement networks for skeleton-based gait recognition9
DHBSR: A deep hybrid representation-based network for blind image super resolution9
SdcNet for object recognition9
Superclass-aware network for few-shot learning9
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network9
SCA-Net: Spatial and channel attention-based network for 3D point clouds8
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications8
Bidirectional brain image translation using transfer learning from generic pre-trained models8
SimpleCut: A simple and strong 2D model for multi-person pose estimation8
End-to-end pedestrian trajectory prediction via Efficient Multi-modal Predictors8
Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation8
Extending function mixture network for improved spectral super-resolution8
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA8
An unsupervised multi-focus image fusion method via dual-channel convolutional network and discriminator7
Anti-jamming heart rate estimation using a spatial–temporal fusion network7
Low-budget label query through domain alignment enforcement7
Editorial Board7
Editorial Board7
Knowledge distillation for incremental learning in semantic segmentation7
LocoGAN — Locally convolutional GAN7
Anchor pruning for object detection7
M-adapter: Multi-level image-to-video adaptation for video action recognition7
Semantically accurate super-resolution Generative Adversarial Networks7
Skeleton Cluster Tracking for robust multi-view multi-person 3D human pose estimation7
Large-scale Riemannian meta-optimization via subspace adaptation7
Incorporating structural prior for depth regularization in shape from focus7
Physics-based shading reconstruction for intrinsic image decomposition7
Periocular biometrics and its relevance to partially masked faces: A survey7
Decoupled appearance and motion learning for efficient anomaly detection in surveillance video7
Fréchet AutoEncoder Distance: A new approach for evaluation of Generative Adversarial Networks7
A fast differential network with adaptive reference sample for gaze estimation7
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation7
AWADA: Foreground-focused adversarial learning for cross-domain object detection7
A review of 3D human pose estimation algorithms for markerless motion capture7
Dynamic Anchor: Density Map Guided Small Object Detector for Tiny Persons7
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval7
Delving into CLIP latent space for Video Anomaly Recognition7
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding7
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics7
FTM: The Face Truth Machine—Hand-crafted features from micro-expressions to support lie detection7
Scene adaptive mechanism for action recognition7
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming6
Empirical study on using adapters for debiased Visual Question Answering6
Deep learning-based single image face depth data enhancement6
RelFormer: Advancing contextual relations for transformer-based dense captioning6
Improved domain adaptive object detector via adversarial feature learning6
As-Global-As-Possible stereo matching with Sparse Depth Measurement Fusion6
Nonlocal Gaussian scale mixture modeling for hyperspectral image denoising6
Lightweight cross-modal transformer for RGB-D salient object detection6
Learning geodesic-aware local features from RGB-D images6
Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality6
Evaluate and improve the quality of neural style transfer6
Bridging the gap between object detection in close-up and high-resolution wide shots6
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters6
Building extraction from remote sensing images with deep learning: A survey on vision techniques6
CMGNet: Collaborative multi-modal graph network for video captioning6
Identity-preserving editing of multiple facial attributes by learning global edit directions and local adjustments6
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt6
Incorporating degradation estimation in light field spatial super-resolution6
Rolling-Shutter-stereo-aware motion estimation and image correction6
Pyramid transformer-based triplet hashing for robust visual place recognition6
Corrigendum to “Improved domain adaptive object detector via adversarial feature learning” [Comput. Vis. Image Underst. 230 (2023) 103660]6
RS3Lip: Consis6
Editorial Board6
High frame rate optical flow estimation from event sensors via intensity estimation5
3D detection transformer: Set prediction of objects using point clouds5
Self-supervised vision transformers for semantic segmentation5
Enhanced local multi-windows attention network for lightweight image super-resolution5
Editorial Board5
Action assessment in rehabilitation: Leveraging machine learning and vision-based analysis5
Editorial Board5
SlowFastFormer for 3D human pose estimation5
A closer look at branch classifiers of multi-exit architectures5
LCMA-Net: A light cross-modal attention network for streamer re-identification in live video5
Editorial Board5
Editorial Board5
Multi-focus image fusion approach based on CNP systems in NSCT domain5
Generative adversarial network for semi-supervised image captioning5
A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification5
Found missing semantics: Supplemental prototype network for few-shot semantic segmentation5
A distribution independence based method for 3D face shape decomposition5
Towards efficient image and video style transfer via distillation and learnable feature transformation5
Improved Short-term Dense Bottleneck network for efficient scene analysis5
Improving the robustness of adversarial attacks using an affine-invariant gradient estimator5
Deep unsupervised shadow detection with curriculum learning and self-training5
Memory-efficient multi-scale residual dense network for single image rain removal5
Deep-STaR: Classification of image time series based on spatio-temporal representations5
MoMa: Skinned motion retargeting using masked pose modeling5
Editorial Board5
VADS: Visuo-Adaptive DualStrike attack on visual question answer5
LKDA-GAN: Cross-modality image synthesis via Generative Adversarial Network aggregating large kernel decomposable attention bottleneck block5
Constituent Attention for Vision Transformers5
Cutout with patch-loss augmentation for improving generative adversarial networks against instability5
1.3786671161652