OOIR: Observatory of International Research

Papers

(The median citation count of International Journal of Multimedia Information Retrieval is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Video anomaly detection with memory-guided multilevel embedding	95
Multiple object tracking under occlusions based on the stage-wise association strategy with weak cues	76
VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias	69
Recent trends in recommender systems: a survey	64
Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey	60
Strengthening attention: knowledge distillation via cross-layer feature fusion for image classification	38
Enhancing Facial Beauty Prediction via a Dual-Pathway Hybrid Architecture Integrating Vmamba and ViT	32
Optimized data-cube search for enhanced video summarization via shot boundary detection	32
CSAM: Capsule spatial attention mask network for visual question answering	30
Enhanced YOLOv10 for small object detection with context-aware and adaptive modules	30
VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds	28
DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images	26
Multi-objective reinforcement learning for recommender systems: a comprehensive survey of methods, challenges, and future directions	24
Feature-NeuS: Neural Implicit Surface Reconstruction Using Feature Multi-View Consistency Constraint	20
Hierarchical multi-modal fusion with vision transformers for robust action recognition in infrared-visible videos	19
Prototype local–global alignment network for image–text retrieval	19
MMDL: a multi-modal deep learning for video highlight detection in sports	19
Similarity-based face image retrieval using sparsely embedded deep features and binary code learning	18
Multimodal music datasets? Challenges and future goals in music processing	17
Human behavior recognition based on DualBiNet model	17
Visual and semantic ensemble for scene text recognition with gated dual mutual attention	17
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis	15
Multi-scale object detection with feature enhancement for traffic scenes	14
Cross-domain image retrieval: methods and applications	14
Generative adversarial networks for 2D-based CNN pose-invariant face recognition	14

CAMIR: fine-tuning CLIP and multi-head cross-attention mechanism for multimodal image retrieval with sketch and text features	14
A Comprehensive Review of Multimodal Visual Representation Learning: Tracing the Evolution from CNNs to Transformers and Beyond	13
DAF-Net: dense attention feature pyramid network for multiscale object detection	13
State of art and emerging trends on group recommender system: a comprehensive review	13
An emotion-driven, transformer-based network for multimodal fake news detection	13
Ultra fast-inference depth completion with linear attention-based cascaded hourglass network	12
Human action recognition using an optical flow-gated recurrent neural network	12
MFAFD: a few-shot learning method for cascading models with parameter free attention and finite discrete space	12
Weighted semantic feature based self-supervised deep cross-modal hashing	11
Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review	11
Concept-based and embedding-based models in lifelog retrieval: an empirical comparison of performance	11
Multi-view learning for camouflaged object detection with PVTv2	11
Study of Alzheimer’s disease brain impairment and methods for its early diagnosis: a comprehensive survey	10
Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks	10
Optical music recognition for homophonic scores with neural networks and synthetic music generation	10
FOF: a fine-grained object detection and feature extraction end-to-end network	10
A Reproducibility Study of Multimodal Embeddings for Recommender Systems	9
Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval	9
A voting-based novel spatio-temporal fusion framework for video saliency using transfer learning mechanism	9
Improving skeleton-based action recognition with interactive object information	8
Style-aware adversarial pairwise ranking for image recommendation systems	8
Stratified Graph Indexing for efficient search in deep descriptor databases	8
MCDINO: Self-supervised learning of masks based on combination of multi-path channel attention and local feature weighting	8
FDAM: full-dimension attention module for deep convolutional neural networks	7
TCKGE: Transformers with contrastive learning for knowledge graph embedding	7
A literature review and perspectives in deepfakes: generation, detection, and applications	7
ETG: the graph convolutional network was enhanced with an EA-transformer for aspect sentiment triplet extraction	7
Who is gambling? Finding cryptocurrency gamblers using multi-modal retrieval methods	7
Few-shot and meta-learning methods for image understanding: a survey	7
An interactive attribute-preserving fashion recommendation with 3D image-based virtual try-on	7
DMFNet: geometric multi-scale pixel-level contrastive learning for video salient object detection	6
Joint multi-scale information and long-range dependence for video captioning	6
Dual-feature collaborative relation-attention networks for visual question answering	6
Deep multimodal learning for time series analysis in social computing: a survey	6
A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content	6
Enhancing action recognition via dynamic cross-frame differential modeling	5
CoCoOpter: Pre-train, prompt, and fine-tune the vision-language model for few-shot image classification	5
Partial multimodal hashing with multi-level semantics and adversarial learning	5
Special Issue on Open-Domain Image Retrieval in the Wild	4
Gender classification from face images using central difference convolutional networks	4
Similar interior coordination image retrieval with multi-view features	4
$$HF^{2}\text {-}Net$$: hybrid fine-tuning heterogeneous fusion network for visible-infrared person Re-identification	4
Image forgery classification and localization through vision transformers	4
Ornament image retrieval using few-shot learning	4
ANROT-HELANet: adverserially and naturally robust attention-based aggregation network via the hellinger distance for few-shot classification	4
Emotion-aware music tower blocks (EmoMTB ): an intelligent audiovisual interface for music discovery and recommendation	4
Sentiment analysis using deep learning techniques: a comprehensive review	4
LG-MLFormer: local and global MLP for image captioning	4
Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking	3
Dual-matrix guided reconstruction hashing for unsupervised cross-modal retrieval	3

CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval	3
Multi-aware coreference relation network for visual dialog	3
Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges	3
A survey of multimodal recommender systems: methods, challenges, and future directions	3
A novel method for video shot boundary detection using CNN-LSTM approach	3
Global and local label-constrained alignment for image-text matching	3
Deep multiple aggregation networks for action recognition	3
A new CNN-based semantic object segmentation for autonomous vehicles in urban traffic scenes	3
Cross-modal alignment with synthetic caption for text-based person search	3
3D skeleton-based human motion prediction using spatial–temporal graph convolutional network	3
Parameter-efficient tuning of cross-modal retrieval for a specific database via trainable textual and visual prompts	3
Special issue on cross-modal retrieval and analysis	3
Remote Sensing Image Change Captioning: A Comprehensive Review	3
Enhancing deep learning image classification using data augmentation and genetic algorithm-based optimization	3
Text detection, recognition, and script identification in natural scene images: a Review	3