Computer Vision and Image Understanding

Papers
(The median citation count of Computer Vision and Image Understanding is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Semantic-preserved point-based human avatar243
Rebalanced supervised contrastive learning with prototypes for long-tailed visual recognition216
Fake News Detection Based on BERT Multi-domain and Multi-modal Fusion Network208
Incremental few-shot instance segmentation via feature enhancement and prototype calibration191
A multi-modal explainability approach for human-aware robots in multi-party conversation128
Luminance prior guided Low-Light 4C catenary image enhancement96
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection83
Target-aware and spatial-spectral discriminant feature joint correlation filters for hyperspectral video object tracking69
Seam estimation based on dense matching for parallax-tolerant image stitching67
Dual stage semantic information based generative adversarial network for image super-resolution66
Multi-Scale Adaptive Skeleton Transformer for action recognition61
Exploring using jigsaw puzzles for out-of-distribution detection59
PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning55
Spatial attention for human-centric visual understanding: An Information Bottleneck method47
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution46
Deep video compression based on Long-range Temporal Context Learning43
Static graph convolution with learned temporal and channel-wise graph topology generation for skeleton-based action recognition41
3D scene generation for zero-shot learning using ChatGPT guided language prompts32
RFCNet: Enhancing urban segmentation using regularization, fusion, and completion32
An egocentric video and eye-tracking dataset for visual search in convenience stores30
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving29
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation28
Local optimization cropping and boundary enhancement for end-to-end weakly-supervised segmentation network28
Trimap-guided feature mining and fusion network for natural image matting26
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification26
Cost-free adversarial defense: Distance-based optimization for model robustness without adversarial training24
LOFReg: An outlier-based regulariser for deep metric learning23
Efficient 6-DoF camera pose tracking with circular edges23
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection22
On the coherency of quantitative evaluation of visual explanations22
A novel fast combine-and-conquer object detector based on only one-level feature map22
De2Net: Under-22
Targeted adversarial attack on classic vision pipelines22
FedER: Federated Learning through Experience Replay and privacy-preserving data synthesis21
View consistency aware holistic triangulation for 3D human pose estimation21
Improved high dynamic range imaging using multi-scale feature flows balanced between task-orientedness and accuracy21
GAMA: Geometric analysis based motion-aware architecture for moving object segmentation20
Feature fine-tuning and attribute representation transformation for zero-shot learning20
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective20
Weakly supervised action segmentation with effective use of attention and self-attention20
Siamese self-supervised learning for fine-grained visual classification20
Dehazing cost volume for deep multi-view stereo in scattering media with airlight and scattering coefficient estimation19
Learning to teach and learn for semi-supervised few-shot image classification19
Discriminative object tracking by domain contrast19
MetaVD: A Meta Video Dataset for enhancing human action recognition datasets18
Feature reconstruction and metric based network for few-shot object detection18
Estimating 3D body mesh without SMPL annotations via alternating successive convex approximation18
Confidence sharing adaptation for out-of-domain human pose and shape estimation18
Emerging image generation with flexible control of perceived difficulty18
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation17
Lightweight feature point detection network with channel enhancement17
Efficient multi-stage network with pixel-wise degradation prediction for real-time motion deblurring17
Precondition and effect reasoning for action recognition17
Towards explainable deep visual saliency models17
Multi-view clustering with Laplacian rank constraint based on symmetric and nonnegative low-rank representation16
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model16
Online object tracking based interactive attention15
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow15
Editorial Board14
Editorial Board14
MATTE: Multi-task multi-scale attention14
Editorial Board14
Editorial Board14
Certifiable algorithms for the two-view planar triangulation problem13
Streaming egocentric action anticipation: An evaluation scheme and approach13
Simplifying open-set video domain adaptation with contrastive learning13
Editorial Board13
Editorial Board13
Editorial Board13
Feature independent Filter Pruning by Successive Layers analysis13
Editorial Board13
Sparse coding with morphology segmentation and multi-label fusion for hyperspectral image super-resolution13
Domain generalized federated learning for Person Re-identification13
Editorial Board12
Grow-push-prune: Aligning deep discriminants for effective structural network compression12
Handling new target classes in semantic segmentation with domain adaptation12
Facial landmarks localization using cascaded neural networks12
Editorial Board12
Editorial Board12
Modality adaptation via feature difference learning for depth human parsing12
Deducing health cues from biometric data11
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition11
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain11
FusionDiff: A unified image fusion network based on diffusion probabilistic models11
Efficient cross-information fusion decoder for semantic segmentation11
Multi-patch multi-scale model for motion deblurring with high-frequency information10
Image amodal completion: A survey10
Full-body virtual try-on using top and bottom garments with wearing style control10
Instance-level salient object segmentation10
3DF-FCOS: Small object detection with 3D features based on FCOS10
Structural reasoning for image-based social relation recognition10
Editorial Board10
Adaptive semantic guidance network for video captioning10
Editorial Board10
Collaborative three-stream transformers for video captioning10
Editorial Board10
Frame-level refinement networks for skeleton-based gait recognition9
DHBSR: A deep hybrid representation-based network for blind image super resolution9
SdcNet for object recognition9
Superclass-aware network for few-shot learning9
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network9
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval9
AdvFAS: A robust face anti-spoofing framework against adversarial examples9
Robust real-world point cloud registration by inlier detection9
Joint image-instance spatial–temporal attention for few-shot action recognition9
CT-VOS: Cutout prediction and tagging for self-supervised video object segmentation9
Dual cross perception network with texture and boundary guidance for camouflaged object detection9
SimpleCut: A simple and strong 2D model for multi-person pose estimation8
End-to-end pedestrian trajectory prediction via Efficient Multi-modal Predictors8
Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation8
Extending function mixture network for improved spectral super-resolution8
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA8
SCA-Net: Spatial and channel attention-based network for 3D point clouds8
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications8
Bidirectional brain image translation using transfer learning from generic pre-trained models8
Periocular biometrics and its relevance to partially masked faces: A survey7
Decoupled appearance and motion learning for efficient anomaly detection in surveillance video7
Fréchet AutoEncoder Distance: A new approach for evaluation of Generative Adversarial Networks7
A fast differential network with adaptive reference sample for gaze estimation7
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation7
AWADA: Foreground-focused adversarial learning for cross-domain object detection7
A review of 3D human pose estimation algorithms for markerless motion capture7
Dynamic Anchor: Density Map Guided Small Object Detector for Tiny Persons7
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval7
Delving into CLIP latent space for Video Anomaly Recognition7
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding7
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics7
FTM: The Face Truth Machine—Hand-crafted features from micro-expressions to support lie detection7
Scene adaptive mechanism for action recognition7
An unsupervised multi-focus image fusion method via dual-channel convolutional network and discriminator7
Anti-jamming heart rate estimation using a spatial–temporal fusion network7
Low-budget label query through domain alignment enforcement7
Editorial Board7
Editorial Board7
Knowledge distillation for incremental learning in semantic segmentation7
LocoGAN — Locally convolutional GAN7
Anchor pruning for object detection7
M-adapter: Multi-level image-to-video adaptation for video action recognition7
Semantically accurate super-resolution Generative Adversarial Networks7
Skeleton Cluster Tracking for robust multi-view multi-person 3D human pose estimation7
Large-scale Riemannian meta-optimization via subspace adaptation7
Incorporating structural prior for depth regularization in shape from focus7
Physics-based shading reconstruction for intrinsic image decomposition7
Building extraction from remote sensing images with deep learning: A survey on vision techniques6
CMGNet: Collaborative multi-modal graph network for video captioning6
Identity-preserving editing of multiple facial attributes by learning global edit directions and local adjustments6
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt6
Incorporating degradation estimation in light field spatial super-resolution6
Rolling-Shutter-stereo-aware motion estimation and image correction6
Pyramid transformer-based triplet hashing for robust visual place recognition6
Corrigendum to “Improved domain adaptive object detector via adversarial feature learning” [Comput. Vis. Image Underst. 230 (2023) 103660]6
RS3Lip: Consis6
Editorial Board6
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming6
Empirical study on using adapters for debiased Visual Question Answering6
Deep learning-based single image face depth data enhancement6
RelFormer: Advancing contextual relations for transformer-based dense captioning6
Improved domain adaptive object detector via adversarial feature learning6
As-Global-As-Possible stereo matching with Sparse Depth Measurement Fusion6
Nonlocal Gaussian scale mixture modeling for hyperspectral image denoising6
Lightweight cross-modal transformer for RGB-D salient object detection6
Learning geodesic-aware local features from RGB-D images6
Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality6
Evaluate and improve the quality of neural style transfer6
Bridging the gap between object detection in close-up and high-resolution wide shots6
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters6
Generative adversarial network for semi-supervised image captioning5
Found missing semantics: Supplemental prototype network for few-shot semantic segmentation5
Towards efficient image and video style transfer via distillation and learnable feature transformation5
Deep-STaR: Classification of image time series based on spatio-temporal representations5
Editorial Board5
LKDA-GAN: Cross-modality image synthesis via Generative Adversarial Network aggregating large kernel decomposable attention bottleneck block5
Cutout with patch-loss augmentation for improving generative adversarial networks against instability5
High frame rate optical flow estimation from event sensors via intensity estimation5
3D detection transformer: Set prediction of objects using point clouds5
Self-supervised vision transformers for semantic segmentation5
MoMa: Skinned motion retargeting using masked pose modeling5
VADS: Visuo-Adaptive DualStrike attack on visual question answer5
Constituent Attention for Vision Transformers5
Editorial Board5
Editorial Board5
A closer look at branch classifiers of multi-exit architectures5
Editorial Board5
Editorial Board5
Multi-focus image fusion approach based on CNP systems in NSCT domain5
Enhanced local multi-windows attention network for lightweight image super-resolution5
Action assessment in rehabilitation: Leveraging machine learning and vision-based analysis5
SlowFastFormer for 3D human pose estimation5
LCMA-Net: A light cross-modal attention network for streamer re-identification in live video5
A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification5
A distribution independence based method for 3D face shape decomposition5
Improved Short-term Dense Bottleneck network for efficient scene analysis5
Improving the robustness of adversarial attacks using an affine-invariant gradient estimator5
Deep unsupervised shadow detection with curriculum learning and self-training5
Memory-efficient multi-scale residual dense network for single image rain removal5
CPNet: Continuity Preservation Network for infrared video colorization4
On the inductive biases of deep domain adaptation4
3D object feature extraction and classification using 3D MF-DFA4
Multi-person 3D pose estimation from a single image captured by a fisheye camera4
Video frame interpolation via down–up scale generative adversarial networks4
Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains4
MAIN: Multi-Attention Instance Network for video segmentation4
Image retrieval with mixed initiative and multimodal feedback4
Class knowledge overlay to visual feature learning for zero-shot image classification4
End-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image4
A new deep CNN for 3D text localization in the wild through shadow removal4
Lmser-pix2seq: Learning stable sketch representations for sketch healing4
GAFL: Global adaptive filtering layer for computer vision4
Enhanced local distribution learning for real image super-resolution4
Implicit and explicit commonsense for multi-sentence video captioning4
Improving semantic video retrieval models by training with a relevance-aware online mining strategy4
Deep structural information fusion for 3D object detection on LiDAR–camera system4
LLAFN-Generator: Learnable linear-attention with fast-normalization for large-scale image captioning4
SIERRA: A robust bilateral feature upsampler for dense prediction4
Single image super-resolution via hybrid resolution NSST prediction4
Disentangled generation network for enlarged license plate recognition and a unified dataset4
Adaptive CNN filter pruning using global importance metric4
Reverse Stable Diffusion: What prompt was used to generate this image?4
Unsupervised domain adaptation for semantic segmentation via cross-region alignment4
Cross-domain fashion cloth retrieval via novel attention-guided cascade neural network and clothing parsing4
The MSR-Video to Text dataset with clean annotations4
Low-light image enhancement by deep learning network for improved illumination map4
MAL-Net: Multiscale Attention Link Network for accurate eye center detection4
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition4
Adaptive semantic transfer network for unsupervised 2D image-based 3D model retrieval4
TransRPN: Towards the Transferable Adversarial Perturbations using Region Proposal Networks and Beyond4
Hierarchical image peeling: A flexible scale-space filtering framework4
Indoor Synthetic Data Generation: A Systematic Review4
Multivariate prototype representation for domain-generalized incremental learning4
AFA-Net: Adaptive Feature Attention Network in image deblurring and super-resolution for improving license plate recognition3
M2FINet: Modality-specific and Modality-shared Features Interaction Network for RGB-IR Person Re-Identification3
Sparse graph matching network for temporal language localization in videos3
Learning single and multi-scene camera pose regression with transformer encoders3
MeT: A graph transformer for semantic segmentation of 3D meshes3
When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition3
Editorial Board3
2.5D visual relationship detection3
PPformer: Using pixel-wise and patch-wise cross-attention for low-light image enhancement3
Enhancing image-based facial expression recognition through muscle activation-based facial feature extraction3
Dual adversarial model: Exploring low-dimensional space features for point clouds generating and completing3
Penalizing proposals using classifiers for semi-supervised object detection3
Online real-time pedestrian tracking from medium altitude aerial footage with camera motion cancellation3
A global generalized maximum coverage-based solution to the non-model-based view planning problem for object reconstruction3
Multi-timescale boosting for efficient and improved event camera face pose alignment3
Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks3
Learning feature contexts by transformer and CNN hybrid deep network for weakly supervised person search3
Editorial Board3
Interactive Neural Painting3
Hallucinating uncertain motion and future for static image action recognition3
Editorial Board3
0.11506414413452