OOIR: Observatory of International Research

Papers

(The median citation count of IEEE Transactions on Circuits and Systems for Video Technology is 6. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
2022 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 32	541
IEEE Transactions on Circuits and Systems for Video Technology Publication Information	474
Table of Contents	351
IEEE Transactions on Circuits and Systems for Video Technology publication information	340
IEEE Transactions on Circuits and Systems for Video Technology publication information	316
Multi-Modal Multi-Grained Embedding Learning for Generalized Zero-Shot Video Classification	263
SARGAN: Spatial Attention-Based Residuals for Facial Expression Manipulation	247
DMRFlow: 4D Radar Scene Flow Estimation With Decoupled Matching and Refinement	238
Joint Learning of Image Deblurring and Depth Estimation Through Adversarial Multi-Task Network	226
Table of Contents	219
IEEE Circuits and Systems Society Information	213
Guest Editorial Introduction to the Special Issue on Label-Efficient Learning on Video Data	204
Harmony: An Eco-Friendly Adaptive Rate Control Scheme for Video-on-Demand in Low Earth Orbit Satellite Internet	203
Synergistic Fusion Network of Microscopic Hyperspectral and RGB Images for Multi-Perspective Segmentation	182
RT3DHVC: A Real-Time Human Holographic Video Conferencing System With a Consumer RGB-D Camera Array	177
Convolutional Neural Networks for Omnidirectional Image Quality Assessment: A Benchmark	175
CRP2-VCS: Contrast-Oriented Region-Based Progressive Probabilistic Visual Cryptography Schemes	170
Stochastic Gradient Perturbation: An Implicit Regularizer for Person Re-Identification	165
Toward Meta-Shape-Based Multi-View 3D Point Cloud Registration: An Evaluation	156
Relative Comparison-Based Consensus Learning for Multi-View Subspace Clustering	155
Representation Robustness and Feature Expansion for Exemplar-Free Class-Incremental Learning	155
SpiReco: Fast and Efficient Recognition of High-Speed Moving Objects With Spike Camera	154
DSC3D: Deformable Sampling Constraints in Stereo 3D Object Detection for Autonomous Driving	152
Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and User Trajectory Information	149
TPCM-SegNet: A Text-Prompted Dual-Path Convolution-Mamba Network for Anomaly Segmentation	148

Filtering and Alternating Calibration: Spatiotemporal Context Alternating Fusion for Event-Based Monocular Depth Estimation	144
FoV Prediction-Based Adaptive Bitrate Streaming With On-Demand Transcoding for 360° Videos	143
LiveMatte: Dynamic Scene Background Restoration and Selective Portrait Patch Enhancement	141
DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation	140
Draw Like an Artist: Complex Scene Generation With Diffusion Model via Composition, Painting, and Retouching	140
Exploring and Exploiting High-Order Spatial–Temporal Dynamics for Long-Term Frame Prediction	137
USVTrack: A Benchmark for Multi-Object Tracking in Complex Water Surface Scenes	134
Projected Generative Adversarial Network for Point Cloud Completion	133
Scene Prior Constrained Self-Paced Learning for Unsupervised Satellite Video Vehicle Detection	128
Few-Shot Temporal Sentence Grounding via Memory-Guided Semantic Learning	128
Instance-Incremental Scene Graph Generation From Real-World Point Clouds via Normalizing Flows	126
Semi-Supervised Crowd Counting via Multi-Task Pseudo-Label Self-Correction Strategy	126
Future Feature-Based Supervised Contrastive Learning for Streaming Perception	125
Semantic-Aware Late-Stage Supervised Contrastive Learning for Fine-Grained Action Recognition	125
Unsupervised Action Segmentation via Multi-Scale Temporal-Interaction Enhancement	120
UAMD-Net: A Unified Adaptive Multimodal Neural Network for Dense Depth Completion	120
Phase-Guided Cross-Frequency Integration Network for ISAR and Optical Image Fusion	119
Scalable and Robust Tensor Ring Decomposition for Large-Scale Data With Missing Data and Outliers	117
Dependability Feature Learning Based on Sample Generation for Unsupervised Text-to-Image Person Re-Identification	117
Graph-Guided Unsupervised Multiview Representation Learning	117
ProMoT: Progressive Prompting of Modality and Temporal Dynamics for RGB-T Tracking	117
Ct-LVI: A Framework Toward Continuous-Time Laser-Visual-Inertial Odometry and Mapping	116
A Format Compliant Framework for HEVC Selective Encryption After Encoding	116
Reconstructing Sparse-view Indoor Scenes in View Space with Global Monocular Prior Alignment	114
CLIP-Based Class Incremental Semantic Segmentation Framework with Generalization-Preserving Knowledge Distillation	113
NDM: Boosting Dataset Distillation via Nested Difficulty Matching	113
Boosting Video Object Segmentation with Discriminative Core Features and Adaptive Position Refinement	111
Efficient Single-Object Tracker Based on Local-Global Feature Fusion	109
Hierarchical Dynamic Programming Module for Human Pose Refinement	109
Multi-Stage Cross-Modality Feature Interaction for RGB-Thermal Multi-Object Tracking	109
VPA: Multi-Modal Virtual Point Augmentation for 3D Object Detection	107
Enhancing Representation Learning With Spatial Transformation and Early Convolution for Reinforcement Learning-Based Small Object Detection	107
MPCF: Multi-Phase Consolidated Fusion for Multi-Modal 3D Object Detection with Pseudo Point Cloud	104
Universal Immunized Cover Construction for Secure Adaptive Steganography across Multiple Domains	103
DS ² VP: Dynamically-Selected Spatially Visual Prompting	102
Local Attention Transformer-Based Full-View Finger-Vein Identification	101
Lossless Dynamic Point Cloud Geometry Compression via Rate-Distortion Optimized Motion Estimation	99
Plausible Proxy Mining With Credibility for Unsupervised Person Re-Identification	98
EIFNet: An Explicit and Implicit Feature Fusion Network for Finger Vein Verification	98
Morphology-Guided Muscle Cell Detection & Counting based on Transfer Learning, FFD Augmentation and Density-Aware Loss Optimization	97
FastAL: Fast Evaluation Module for Efficient Dynamic Deep Active Learning Using Broad Learning System	97
Crowd-Powered Photo Enhancement Featuring an Active Learning Based Local Filter	96
Dual Difficulty-Aware Adaptive Pseudo Labeling for Semi-Supervised CNV Segmentation	96
Spectral–Spatial Feature Extraction With Dual Graph Autoencoder for Hyperspectral Image Clustering	95
Block Diagonal Graph Embedded Discriminative Regression for Image Representation	95
PPIFuse: Physical Priors Injected Infrared and Visible Image Fusion	95
DP-Retinex: Dual-Prior Guided Low-Light Image Enhancement With YUV-Domain Reflectance-Illumination Decomposition	94
Uni3DA: Universal 3D Domain Adaptation for Object Recognition	92
Reversible Data Hiding Over Encrypted Images via Preprocessing-Free Matrix Secret Sharing	92
Reliable Entropy-Induced Anchor Learning for Incomplete Multi-View Subspace Clustering	89

Fully Unsupervised Domain-Agnostic Image Retrieval	88
Semantic Boosting via Knowledge Sharing and Feedback for Video Anomaly Detection	88
Learning Spatio-Temporal Sharpness Map for Video Deblurring	88
Robust Image Watermarking With Synchronization Using Template Enhanced-Extracted Network	88
Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition	88
Push-and-Pull: A General Training Framework With Differential Augmentor for Domain Generalized Point Cloud Classification	87
MEF-GD: Multimodal Enhancement and Fusion Network for Garment Designer	86
Iterative Self-Guided Image Filtering	86
Edge and Skeleton Guidance Network for Salient Object Detection in Optical Remote Sensing Images	86
Key Role Guided Transformer for Group Activity Recognition	85
Spatial Attention-Guided Light Field Salient Object Detection Network With Implicit Neural Representation	85
Representing Boundary-Ambiguous Scene Online With Scale-Encoded Cascaded Grids and Radiance Field Deblurring	85
Pro-Tuning: Unified Prompt Tuning for Vision Tasks	84
Active Spatial Positions Based Hierarchical Relation Inference for Group Activity Recognition	84
Deep Convolutional Primal-Dual Network for Image Deblurring	84
Deep Affine Motion Compensation Network for Inter Prediction in VVC	83
Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition	83
MCCE-REC: MLLM-Driven Cross-Modal Contrastive Entropy Model for Zero-Shot Referring Expression Comprehension	82
A Clinically Guided Graph Convolutional Network for Assessment of Parkinsonian Pronation-Supination Movements of Hands	82
Equity in Unsupervised Domain Adaptation by Nuclear Norm Maximization	82
Frequency Generation for Real-World Image Super-Resolution	82
TiGDistill-BEV: Multi-View BEV 3D Object Detection via Target Inner-Geometry Learning Distillation	81
Subjective and Objective Quality Assessment of Display Content Videos	81
Learning Depth-Density Priors for Fourier-Based Unpaired Image Restoration	80
Learning Appearance-Motion Synergy via Memory-Guided Event Prediction for Video Anomaly Detection	80
SMART: Semantic Matching Contrastive Learning for Partially View-Aligned Clustering	79
Exploring Explicitly Disentangled Features for Domain Generalization	79
Deep and Low-Rank Quaternion Priors for Color Image Processing	79
ASCFormer: An Adaptive Structure-Aware Cascaded Transformer for 3D Object Detection	79
Adversarial Dual-Student With Differentiable Spatial Warping for Semi-Supervised Semantic Segmentation	78
Towards Video Anomaly Detection in the Real World: A Binarization Embedded Weakly-Supervised Network	78
Highly-Parallel Hardwired Deep Convolutional Neural Network for 1-ms Dual-Hand Tracking	78
Learning Monocular Depth via Cascaded Iterative Refinement in Visual-Echo Scenes	78
Multi-Level Feature Fusion Network for Shadow Removal Detection	77
Lightweight Neural Network for Enhancing Imaging Performance of Under-Display Camera	77
Relation-Aware Multi-Pass Comparison Deconfounded Network for Change Captioning	77
AirSOD: A Lightweight Network for RGB-D Salient Object Detection	77
MultiHuman: Leverage Multimodal Prompts for Controllable Multi-Person Image Synthesizing	77
CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation	77
WordCon: Word-level Typography Control in Visual Text Rendering	76
Negative Class Guided Spatial Consistency Network for Sparsely Supervised Semantic Segmentation of Remote Sensing Images	76
Learning to Capture the Query Distribution for Few-Shot Learning	76
Pose-Guided Transformer for Fine-Grained Action Quality Assessment	76
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture	75
MMI-Det: Exploring Multi-Modal Integration for Visible and Infrared Object Detection	75
Multi-Modal Attribute Prompting for Vision-Language Models	74
VDTR: Video Deblurring With Transformer	74
Dual-Stream Transformer With Distribution Alignment for Visible-Infrared Person Re-Identification	73
UDTCWT-PHFMs Domain Statistical Image Watermarking Using Vector BW-Type R Distribution	73
D³C²-Net: Dual-Domain Deep Convolutional Coding Network for Compressive Sensing	73
Video Understanding With Large Language Models: A Survey	73
IEEE Transactions on Circuits and Systems for Video Technology publication information	72
IEEE Circuits and Systems Society Information	72
IEEE Transactions on Circuits and Systems for Video Technology publication information	72
All-Inclusive Image Enhancement for Degraded Images Exhibiting Low-Frequency Corruption	71
Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering	71
Multi-Scale Explicit Matching and Mutual Subject Teacher Learning for Generalizable Person Re-Identification	71
Recent Advances in Rate Control: From Optimization to Implementation and Beyond	71
BIMM: Brain Inspired Masked Modeling for Video Representation Learning	70
Compensating for the Incomplete With the Complete: An Efficient Scene Text Detector	70
A Novel Deep Learning Framework for Automatic Recognition of Thyroid Gland and Tissues of Neck in Ultrasound Image	70
MixSSC: Forward-Backward Mixture for Vision-Based 3D Semantic Scene Completion	70
Feature Evaluation and Joint Interaction for Audio-Visual Emotion Recognition	70
Monocular Depth Estimation on Adverse Weathers With Curriculum Domain Distribution Alignment	70
Enhancing Robustness of Multi-Object Trackers With Temporal Feature Mix	70
Inter-Scale Similarity Guided Cost Aggregation for Stereo Matching	69
AMTFusion: boosting 3D object detection by adaptive multi-modal temporal fusion and augmentation	69
Texture-Aware Spherical Rotation for High Efficiency Omnidirectional Intra Video Coding	69
Mesh2Animation: Unsupervised Animating for Quadruped 3D Objects	69
TAKD: Target-Aware Knowledge Distillation for Remote Sensing Scene Classification	69
Blind Image Quality Index for Authentic Distortions With Local and Global Deep Feature Aggregation	69
FaceGCN: Structured Priors Inspired Graph Convolutional Networks for Face Restoration With Unknown Degradations	69
Unsupervised Deep Hashing With Fine-Grained Similarity-Preserving Contrastive Learning for Image Retrieval	69
CO ⁺ ₃ : Improved Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning	68
Low-Resolution Object Recognition With Cross-Resolution Relational Contrastive Distillation	68
Medical Data Security in Blockchain: A telemedicine data sharing scheme based on custom OPE and 4D-YG hyperchaotic	68
Task-Specific Loss for Robust Instance Segmentation With Noisy Class Labels	68
Learning Scene-Invariant Distribution for Generalizable Blind Image Quality Assessment	68
A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras	68
Table of Contents	68

Holistic Prototype Attention Network for Few-Shot Video Object Segmentation	67
Learning With Noisy Labels by Semantic and Feature Space Collaboration	67
Adaptive Mixture-of-Experts Distillation for Cross-Satellite Generalizable Incremental Remote Sensing Scene Classification	67
ImagingNet: A New Learnable SAR Imaging Method via Hierarchical U-Shaped Network	67
Folding the Vision: Towards Efficient Global Context Representation on Edge Devices	66
GNNLicense: An Active Intellectual Property Protection Technique for Graph Neural Networks based on Autoencoder	66
Laplacian Pyramid Fusion Network With Hierarchical Guidance for Infrared and Visible Image Fusion	66
Semantic Disentanglement Adversarial Hashing for Cross-Modal Retrieval	66
Transformer-Based Multimodal Emotional Perception for Dynamic Facial Expression Recognition in the Wild	66
Integrating Pseudo-Supervision and Spatial Constraints for Efficient Clustering of Multimodal Remote Sensing Data	66
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion	66
Errata to “Local-Global Temporal Difference Learning for Satellite Video Super-Resolution”	65
Robust Matrix Completion Based on Factorization and Truncated-Quadratic Loss Function	65
FDNet: Frequency Decomposition Network for Learned Image Compression	65
StreetSurfGS: Scalable Urban Street Surface Reconstruction With Planar-Based Gaussian Splatting	65
Table of Contents	65
Target-Aware Tracking With Spatial-Temporal Context Attention	65
OraL: An Observational Learning Paradigm for Unsupervised Hyperspectral Change Detection	64
WeaFU: Weather-Informed Image Blind Restoration via Multi-Weather Distribution Diffusion	64
FDAC: Federated Domain Adaptation via Dual Contrastive Learning	64
Conditional Dual Diffusion for Multimodal Clustering of Optical and SAR Images	63
CNN-Transformer Based Generative Adversarial Network for Copy-Move Source/ Target Distinguishment	63
VSOIQE: A Novel Viewport-Based Stitched 360° Omnidirectional Image Quality Evaluator	63
Surveillance Video-and-Language Understanding: From Small to Large Multimodal Models	62
DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes	62
Flow-Edge Guided Unsupervised Video Object Segmentation	62
Self-Supervised Adversarial Video Summarizer With Context Latent Sequence Learning	62
Depth Estimation From a Single Image of Blast Furnace Burden Surface Based on Edge Defocus Tracking	62
Cloth-Imbalanced Gait Recognition via Hallucination	62
Optical Flow Reusing for High-Efficiency Space-Time Video Super Resolution	62
Multi-Prior Driven Network for RGB-D Salient Object Detection	61
A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation	61
G2LP-Net: Global to Local Progressive Video Inpainting Network	61
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation	61
Flow Visualization for Complex Fluid Flows via a Structure-Enhanced Motion Estimator	60
Lightweight and Personalized Single-Eye Emotion Recognition via CNN-SNN Spatiotemporal Learning and Memory-Inferred Event Features	60
SMR: Spatial-Guided Model-Based Regression for 3D Hand Pose and Mesh Reconstruction	60
Multimodal Industrial Anomaly Detection via Geometric Prior	60
DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and Authentication	59
FedRSL: Representation Subspace Learning in Model-Heterogeneous Federated Learning	59
Efficiently Exploiting Spatially Variant Knowledge for Video Deblurring	59
Diverse Batch Steganography Using Model-Based Selection and Double-Layered Payload Assignment	59
Touchless Finger Vein and Fingerprint Verification via Exploiting Attention-Based Cross-Domain Fusion	59
Non-Local Guided Neural Fields for 4D CT Reconstruction	58
CAIR-Net: Reliability-Aware Information Routing for Robust Multimodal Object Detection under Modality Degradation	58
Enhancing Vision Transformer with Shift Expansion Linear Attention for Image Classification and Object Tracking	58
Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering	58
MDSLCA: Multi-scale Dilated Spatial and Local Channel Attention for LiDAR Point Cloud Semantic Segmentation	58
Enhanced Spatial-Temporal Salience for Cross-View Gait Recognition	57
Efficient Non-Blind Image Deblurring With Discriminative Shrinkage Deep Networks	57
STAF: 3D Human Mesh Recovery From Video With Spatio-Temporal Alignment Fusion	57
Forgery-Aware Adaptive Learning With Vision Transformer for Generalized Face Forgery Detection	57
Explicit Spatial-Angular Pattern Embedding for Heterogeneous Imaging-Oriented Light Field Spatial Super-Resolution	57
One for All: A Unified Generative Framework for Image Emotion Classification	57
Dynamic Particle Filter Framework for Robust Object Tracking	56
A Universal Framework for Improving the Robustness of Coverless Image Steganography Based on Image Restoration	56
Balanced Teacher for Source-Free Object Detection	56
HyPSAM: Hybrid Prompt-Driven Segment Anything Model for RGB-Thermal Salient Object Detection	56
Dynamic Hypergraph Convolutional Network for No-Reference Point Cloud Quality Assessment	55
Explanation-Guided Adversarial Training for Robust and Interpretable Models	55
Table of Contents	55
Meta-Learning Based Domain Prior With Application to Optical-ISAR Image Translation	55
M³CS: Multi-Target Masked Point Modeling With Learnable Codebook and Siamese Decoders	55
MSGA-Net: Progressive Feature Matching via Multi-Layer Sparse Graph Attention	55
VmambaIR: Visual State Space Model for Image Restoration	55
IEEE Transactions on Circuits and Systems for Video Technology publication information	55
Bridging Inter-Task Gap of Continual Self-Supervised Learning With External Data	54
LGTrack: Exploiting Local and Global Properties for Robust Visual Tracking	54
Real Image Denoising via Guided Residual Estimation and Noise Correction	54
DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo	54
Concept-Enhanced Relation Network for Video Visual Relation Inference	54
Diffusion-Based Depth Inpainting for Transparent and Reflective Objects	54
Reference-Guided Large-Scale Face Inpainting With Identity and Texture Control	54
Generative Image Steganography Based on Text-to-Image Multimodal Generative Model	53
A Pixel-Level Segmentation-Synthesis Framework for Dynamic Texture Video Compression	53
Special Issue on Segment Anything for Videos and Beyond	53
MtArtGPT: A Multi-Task Art Generation System With Pre-Trained Transformer	53
MGD-SAM2: Multi-view Guided Detail-enhanced Segment Anything Model 2 for High-Resolution Class-agnostic Segmentation	53
CLSR: Cross-Layer Interaction Pyramid Super-Resolution Network	53
Enhancing Transparent Object Matting Using Predicted Definite Foreground and Background	53
Knowledge-Based Visual Question Generation	53
VideoPure: Diffusion-Based Adversarial Purification for Video Recognition	53
IEEE Circuits and Systems Society Information	52
Table of Contents	52
SmokePose: End-to-End Smoke Keypoint Detection	52
DBVC: An End-to-End 3-D Deep Biomedical Video Coding Framework	52
StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion	51
Flexible Temperature Parallel Distillation for Dense Object Detection: Make Response-Based Knowledge Distillation Great Again	51
Optical Flow-Based Spatiotemporal Sketch for Video Representation: A Novel Framework	51
Think Twice Before Determining: Towards Scene-aware Visual Reasoning for Mirror Detection	51
CRDH: Compatible Reversible Data Hiding With High Capacity and Generalization	51
Semantic-Context Graph Network for Point-Based 3D Object Detection	51
DilatedTAD: Enhancing Adaptability to Actions of Varying Durations for Temporal Action Detection	51
Prototype Decoupled Knowledge Distillation	51
CodingHomo: Bootstrapping Deep Homography With Video Coding	51