Image and Vision Computing

Papers
(The TQCC of Image and Vision Computing is 8. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)
ArticleCitations
PST-Mamba: Spatio-temporal selective state fusion for effective point cloud video understanding with state space models427
ADVC: Adversarial dense video captioning with unsupervised pretraining189
CAGS: Open-vocabulary 3D scene understanding with context-aware Gaussian splatting185
ABC: Aligning binary centers for single-stage monocular 3D object detection181
Alignment and fusion for adaptive domain nighttime semantic segmentation121
Few-shot-based video generation via multimodal fusion and Fourier Spliter116
Feature decoupling and interaction network for defending against adversarial examples98
Modeling content-attribute preference for personalized image esthetics assessment87
GLMambaNet: Mamba-based decoder with local detail enhancement for semantic segmentation of remote sensing imagery84
Efficient ultra-lightweight convolutional attention network for embedded identity document recognition system77
Accurate and efficient salient object detection via position prior attention76
Multi-information guided camouflaged object detection68
G-TRACE: Grouped temporal recalibration for video object segmentation67
Hourglass cascaded recurrent stereo matching network66
DMNet: Image dehazing via Dual-Domain Modulation59
SRMA-KD: Structured relational multi-scale attention knowledge distillation for effective lightweight cardiac image segmentation59
HPD-Depth: High performance decoding network for self-supervised monocular depth estimation54
Window normalization: Enhancing point cloud understanding by unifying inconsistent point densities53
Background debiased class incremental learning for video action recognition53
Privacy-preserving explainable AI enable federated learning-based denoising fingerprint recognition model50
RGB-T tracking by modality difference reduction and feature re-selection48
MAFUNet: Mamba with adaptive fusion UNet for medical image segmentation48
DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions47
AI-powered trustable and explainable fall detection system using transfer learning43
BF3D: Bi-directional fusion 3D detector with semantic sampling and geometric mapping43
Lightweight multi-scale global attention enhancement network for image super-resolution42
Learning diverse and deep clues for person reidentification41
Active domain adaptation for semantic segmentation via dynamically balancing domainness and uncertainty40
CODNet: Context-based object detection network for multimodal image captioning and virtual question answering38
Synthetic lidar point cloud generation using deep generative models for improved driving scene object recognition38
Single stage architecture for improved accuracy real-time object detection on mobile devices36
GAN-BodyPose: Real-time 3D human body pose data key point detection and quality assessment assisted by generative adversarial network36
Few-shot classification with multisemantic information fusion network35
Burst image super-resolution via multi-cross attention encoding and multi-scan state-space decoding34
SAGNet: Synergistic Attention-Graph Network For video salient object detection34
CMS-net: Edge-aware multimodal MRI feature fusion for brain tumor segmentation34
CSG-DOF:A Class Structure-Guided Discriminative Optimization Framework for few-shot object detection34
Enhanced residual network for burst image super-resolution using simple base frame guidance33
Self-supervised Vision Transformers for 3D pose estimation of novel objects32
Two-stream transformer tracking with messengers32
Multi-view dynamic facial action unit detection31
Depth assisted novel view synthesis using few images31
Recent advances in deterministic human motion prediction: A review31
FSBI: Deepfake detection with frequency enhanced self-blended images31
A Point-2s reinforcement learning biomimetic model for estimating and analyzing human 3D motion posture30
ST-VTON: Self-supervised vision transformer for image-based virtual try-on30
Editorial to special issue on selected extended works from 9th international conference on computer vision & image processing (CVIP) 202429
1D kernel distillation network for efficient image super-resolution29
Learning accurate monocular 3D voxel representation via bilateral voxel transformer29
SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation29
Memory-MambaNav: Enhancing object-goal navigation through integration of spatial–temporal scanning with state space models28
Deep learning with adaptive convolutions for classification of retinal diseases via optical coherence tomography28
Dual subspace clustering for spectral-spatial hyperspectral image clustering28
Editorial Board27
Utilizing Inherent Bias for Memory Efficient Continual Learning: A Simple and Robust Baseline27
MVPCC-Net: Multi-View Based Point Cloud Completion Network for MLS data27
Frequency and content dual stream network for image dehazing26
Visionary vigilance: Optimized YOLOV8 for fallen person detection with large-scale benchmark dataset26
TransMix: Crafting highly transferable adversarial examples to evade face recognition models25
Mixup Mask Adaptation: Bridging the gap between input saliency and representations via attention mechanism in feature mixup25
Editorial Board25
Editorial Board25
SADGFeat: Learning local features with layer spatial attention and domain generalization25
EMA-GS: Improving sparse point cloud rendering with EMA gradient and anchor upsampling24
Face deidentification with controllable privacy protection24
Enhancing consistency in virtual try-on: A novel diffusion-based approach24
Doctor-in-the-Loop: An explainable, multi-view deep learning framework for predicting pathological response in non-small cell lung cancer24
Object tracking based on temporal and spatial context information24
PatchMixer: Rethinking network design to boost generalization for 3D point cloud understanding24
DynaGuide: A generalizable dynamic guidance framework for zero-shot guided unsupervised semantic segmentation23
Mitigating human fall injuries: A novel system utilizing 3D 4-stream convolutional neural networks and image fusion23
Landmark-in-facial-component: Towards occlusion-robust facial landmark localization23
RFSC-net: Re-parameterization forward semantic compensation network in low-light environments22
UHDNet: Unified multimodal fusion harmonization and hierarchical dependency learning for visible-infrared person re-identification22
Robust visual tracking via modified Harris hawks optimization22
Feature alignment via mutual mapping for few-shot fine-grained visual classification22
Deep learning enhanced monocular visual odometry: Advancements in fusion mechanisms and training strategies22
Intelligent facial expression recognition and classification using optimal deep transfer learning model21
Enhancing brain tumor classification in MRI images: A deep learning-based approach for accurate diagnosis21
A survey on dynamic neural networks: From computer vision to multi-modal sensor fusion21
CNN and Transformer-based deep learning models for automated white blood cell detection21
Unsupervised Object Localization driven by self-supervised foundation models: A comprehensive review21
NPVForensics: Learning VA correlations in non-critical phoneme–viseme regions for deepfake detection21
STAFFormer: Spatio-temporal adaptive fusion transformer for efficient 3D human pose estimation21
Distributed collaborative machine learning in real-world application scenario: A white blood cell subtypes classification case study20
PAGML: Precise Alignment Guided Metric Learning for sketch-based 3D shape retrieval20
AGSAM-Net: UAV route planning and visual guidance model for bridge surface defect detection20
Matte anything: Interactive natural image matting with segment anything model20
A spatial-frequency domain multi-branch decoder method for real-time semantic segmentation20
A multi-branch dual attention segmentation network for epiphyte drone images20
Contrast enhancement of region of interest of backlit image for surveillance systems based on multi-illumination fusion20
Dual-branch adaptive attention transformer for occluded person re-identification20
SDMNet: Spatially dilated multi-scale network for object detection for drone aerial imagery20
Detection of anomaly in surveillance videos using quantum convolutional neural networks20
Underwater image restoration based on light attenuation prior and color-contrast adaptive correction20
A new multi-picture architecture for learned video deinterlacing and demosaicing with parallel deformable convolution and self-attention blocks20
Estimating blood pressure using video-based PPG and deep learning19
DFG-HCEN: A distinctive-feature guided and hierarchical channel enhanced network-based infrared and visible image fusion19
Deep learning-based efficient diagnosis of periapical diseases with dental X-rays19
AHA-track: Aggregating hierarchical awareness features for single19
CollaborativeBEV: Collaborative bird eye view for reconstructing crowded environment19
ECNet: An edge-guided and cross-image perception network for collaborative camouflaged object detection19
Self-knowledge distillation based on knowledge transfer from soft to hard examples19
Adaptive scale matching for remote sensing object detection based on aerial images19
E-Net for pansharpening: A super-resolution perspective19
CRFormer: A cross-region transformer for shadow removal19
Source domain prior-assisted segment anything model for single domain generalization in medical image segmentation18
Class-discriminative domain generalization for semantic segmentation18
FgbCNN: A unified bilinear architecture for learning a fine-grained feature representation in facial expression recognition18
PW-NeRF: Progressive wavelet-mask guided neural radiance fields view synthesis18
CSUnetr: Cross-scale attention based U-Net transformers for whole-brain segmentation with targeted hippocampal analysis in brain MR images18
SAMNet: Adapting segment anything model for accurate light field salient object detection18
Phase shift guided dynamic view synthesis from monocular video18
Distribution-modulated binary neural network for image classification18
PixTention: Dynamic pixel-level adapter using attention maps18
A novel facial expression recognition model based on harnessing complementary features in multi-scale network with attention fusion18
Anchor-based discriminative dual distribution calibration for transductive zero-shot learning18
WPE: Weighted prototype estimation for few-shot learning18
Real-time human-centric segmentation for complex video scenes17
M2VAD: Multiview multi17
Attentive spatial-temporal contrastive learning for self-supervised video representation17
TQRFormer: Tubelet query recollection transformer for action detection17
Social robot in service of the cognitive therapy of elderly people: Exploring robot acceptance in a real-world scenario17
OFACD: An end-to-end change detection network for small UAVs remote sensing with viewpoint differences17
RLTNT: An explainable residual learning-based transformer model for kidney disease classification17
Adaptive graph reasoning network for object detection17
Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy17
Real-time gait biometrics for surveillance applications: A review17
Multi-axis interactive multidimensional attention network for vehicle re-identification16
Stacked graph bone region U-net with bone representation for hand pose estimation and semi-supervised training16
Perceiving local relative motion and global correlations for weakly supervised group activity recognition16
Online multi-object tracking with δ-GLMB filter based on occlusion and identity switch handling16
Cross-level fusion network for two-stage polyp segmentation via integrity learning16
Corrigendum to “A novel framework for diverse video generation from a single video using frame-conditioned denoising diffusion probabilistic model and ConvNeXt-V2” [Image and Vision Computing 154 (20216
Semantic-aware for point cloud domain adaptation with self-distillation learning16
A comprehensive survey on magnetic resonance image reconstruction16
A lightweight shallow convolution neural network for automatic identification of Diabetic Foot Ulcers15
Face and body-shape integration model for cloth-changing person re-identification15
TABNet: A Triplet Augmentation Self-recovery framework with Boundary-aware Pseudo-labels for scribble-based medical image segmentation15
Dynamic semantic prototype perception for text–video retrieval15
Editorial Board15
BCDPose: Diffusion-based 3D Human Pose Estimation with bone-chain prior knowledge15
Enhancing small object tracking with reversible rescaling networks15
CVAD-GAN: Constrained video anomaly detection via generative adversarial network15
Few-shot class incremental learning via prompt transfer and knowledge distillation15
Optimal deep transfer learning based ethnicity recognition on face images15
An edge-aware high-resolution framework for camouflaged object detection15
H-net: Unsupervised domain adaptation person re-identification network based on hierarchy15
Dual-stage network combining transformer and hybrid convolutions for stereo image super-resolution15
Bridging efficiency and interpretability: Explainable AI for multi-classification of pulmonary diseases utilizing modified lightweight CNNs15
Point-cloud-based hand gesture recognition using principal component analysis and boundary extraction15
Data-driven 2D-EWT based diabetic retinopathy identification using hybrid neural network15
Editorial Board14
PR-DETR: Extracting and utilizing prior knowledge for improved end-to-end object detection14
A video anomaly detection and classification method based on cross-modal feature alignment14
Enhancing UAV small target detection: A balanced accuracy-efficiency algorithm with tiered feature focus14
CoHAtNet: An integrated convolutional-transformer architecture with hybrid self-attention for end-to-end camera localization14
Synthetic multi-view clustering with missing relationships and instances14
Semantic scene graph generation based on an edge dual scene graph and message passing neural network14
BTMTrack: Robust RGB-T tracking via dual-template bridging and temporal-modal candidate elimination14
Deep hybrid learning for facial expression binary classifications and predictions14
Your image generator is your new private dataset14
Twin relaxed least squares regression with classwise mean constraint for image classification14
Exploiting spatial and temporal context for online tracking with improved transformer14
Guest Editorial : Learning with Manifolds in Computer Vision14
HMPFormer: Hierarchical vision transformer with multi-perspective feature learning for precise polyp segmentation14
Similarity verification of kinship pairs using metricized emphasis14
Self-distillation guided Semantic Knowledge Feedback network for infrared–visible image fusion14
Editorial Board14
GFFT: Global-local feature fusion transformers for facial expression recognition in the wild14
A decision support system for acute lymphoblastic leukemia detection based on explainable artificial intelligence13
Weather-degraded image semantic segmentation with multi-task knowledge distillation13
UIR-ES: An unsupervised underwater image restoration framework with equivariance and stein unbiased risk estimator13
SDE-RAE:CLIP-based realistic image reconstruction and editing network using stochastic differential diffusion13
PD-DDPM: Prior-driven diffusion model for single image dehazing13
Optimizing multimodal personalized disease prediction accuracy using generated prompts and large language models13
Synthetic data sets for person Re-Identification: A critical analysis13
External knowledge-assisted Transformer for image captioning13
Resource-aware strategies for real-time multi-person pose estimation13
Speaker independent VSR: A systematic review and futuristic applications13
Qualitative failures of image generation models and their application in detecting deepfakes13
Preserving instance-level characteristics for multi-instance generation13
Efficient Mamba: Overcoming the visual limitations of Mamba with innovative structures13
Black-box reversible adversarial examples with invertible neural network13
Multi-object tracking with adaptive measurement noise and information fusion13
AI4RDD: Artificial Intelligence and Rare Disease Diagnosis: A proposal to improve the anamnesis process13
Flexible multi-objective particle swarm optimization clustering with game theory to address human activity discovery fully unsupervised13
Editorial Board12
DiPS: Discriminative pseudo-label sampling with self-supervised transformers for weakly supervised object localization12
Video object segmentation by multi-scale attention using bidirectional strategy12
ECT: Fine-grained edge detection with learned cause tokens12
A dual-channel network based on occlusion feature compensation for human pose estimation12
Robust ensemble person reidentification via orthogonal fusion with occlusion handling12
A dedicated benchmark for contour-based corner detection evaluation12
Effective hybrid attention network based on pseudo-color enhancement in ultrasound image segmentation12
Fuzzy set-based Bernoulli Random Noise Weighted Loss for unsupervised person re-identification12
Cross-modal hybrid architectures for gastrointestinal tract image analysis: A systematic review and futuristic applications12
DRM-YOLO: A YOLOv11-based structural optimization method for small object detection in UAV aerial imagery12
Parameter efficient finetuning of text-to-image models with trainable self-attention layer12
Depth awakens: A depth-perceptual attention fusion network for RGB-D camouflaged object detection12
Contrastive learning based facial action unit detection in children with hearing impairment for a socially assistive robot platform12
OCUCFormer: An Over-Complete Under-Complete Transformer Network for accelerated MRI reconstruction12
Boosting semi-supervised face recognition with raw faces12
Efficient masked feature and group attention network for stereo image super-resolution12
Editorial Board12
Feature extraction and fusion algorithm for infrared visible light images based on residual and generative adversarial network11
Transformer-based feature interactor for person re-identification with margin self-punishment loss11
Knowledge graph construction in hyperbolic space for automatic image annotation11
Image–text feature learning for unsupervised visible–infrared person re-identification11
Part-aware distillation and aggregation network for human parsing11
Person re-identification: A taxonomic survey and the path ahead11
FRoundation: Are foundation models ready for face recognition?11
EFDCNet: Encoding fusion and decoding correction network for RGB-D indoor semantic segmentation11
ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation11
RGB road scene material segmentation11
Editorial Board11
Attention guided multi-level feature aggregation network for camouflaged object detection11
Noisy label facial expression recognition via face-specific label distribution learning11
SAKD: Sparse attention knowledge distillation11
Federated learning based nonlinear two-stage framework for full-reference image quality assessment: An application for biometric11
Leveraging spatial-channel attention in U-Net for enhanced segmentation of martian dust storms11
Action-aware anchor-based frame selection strategy for action recognition11
Bidirectional causal learning for visual question answering11
Unified Volumetric Avatar: Enabling flexible editing and rendering of neural human representations11
Learning language to symbol and language to vision mapping for visual grounding11
Gait recognition via View-aware Part-wise Attention and Multi-scale Dilated Temporal Extractor11
STIFormer: RGB-T tracking via Spatial–Temporal Interaction Transformer11
Editorial Board11
Combining complementary trackers for enhanced long-term visual object tracking11
Editorial Board11
Tri-UNetX: Tri-plane UNet with xLSTM for 3D cell segmentation11
Mobile-friendly and multi-feature aggregation via transformer for human pose estimation11
Drone-NeRF: Efficient NeRF based 3D scene reconstruction for large-scale drone survey11
Text-augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification10
SAMUNet: Enhancing pillar-based 3D object detection in autonomous driving with Shape-aware Mini-Unet10
AES-Net: An adapter and enhanced self-attention guided network for multi-stage glaucoma classification using fundus images10
Dense open-set recognition based on training with noisy negative images10
SinWaveFusion: Learning a single image diffusion model in wavelet domain10
RBGAN: Realistic-generation and balanced-utility GAN for face de-identification10
A supervised approach for the detection of AM-FM signals’ interference regions in spectrogram images10
On the relevance of patch-based extraction methods for monocular depth estimation10
EMNet: Edge-guided multi-level network for salient object detection in low-light images10
Three dimensional tracking of rigid objects in motion using 2D optical flows10
Universal domain adaptation from multiple black-box sources10
GW-net: An efficient grad-CAM consistency neural network with weakening of random erasing features for semi-supervised person re-identification10
Modal-aware contrastive learning for hyperspectral and LiDAR classification10
CF-SOLT: Real-time and accurate traffic accident detection using correlation filter-based tracking10
Transferable dual multi-granularity semantic excavating for partially relevant video retrieval10
MOT-STM: Maritime Object Tracking: A Spatial-Temporal and Metadata-based approach10
Distributed quantum model learning for traffic density estimation10
0.20583701133728