International Journal of Multimedia Information Retrieval

Papers
(The median citation count of International Journal of Multimedia Information Retrieval is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey93
VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias74
Video anomaly detection with memory-guided multilevel embedding67
Recent trends in recommender systems: a survey59
Multiple object tracking under occlusions based on the stage-wise association strategy with weak cues58
Enhancing Facial Beauty Prediction via a Dual-Pathway Hybrid Architecture Integrating Vmamba and ViT36
Strengthening attention: knowledge distillation via cross-layer feature fusion for image classification32
DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images31
Enhanced YOLOv10 for small object detection with context-aware and adaptive modules28
VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds28
CSAM: Capsule spatial attention mask network for visual question answering27
Feature-NeuS: Neural Implicit Surface Reconstruction Using Feature Multi-View Consistency Constraint26
Prototype local–global alignment network for image–text retrieval24
Multi-objective reinforcement learning for recommender systems: a comprehensive survey of methods, challenges, and future directions20
Hierarchical multi-modal fusion with vision transformers for robust action recognition in infrared-visible videos19
MMDL: a multi-modal deep learning for video highlight detection in sports19
Similarity-based face image retrieval using sparsely embedded deep features and binary code learning18
Human behavior recognition based on DualBiNet model17
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis16
Visual and semantic ensemble for scene text recognition with gated dual mutual attention16
How can users’ comments posted on social media videos be a source of effective tags?16
Multimodal music datasets? Challenges and future goals in music processing15
CAMIR: fine-tuning CLIP and multi-head cross-attention mechanism for multimodal image retrieval with sketch and text features14
Semantic-enhanced discriminative embedding learning for cross-modal retrieval14
A Comprehensive Review of Multimodal Visual Representation Learning: Tracing the Evolution from CNNs to Transformers and Beyond14
State of art and emerging trends on group recommender system: a comprehensive review14
DAF-Net: dense attention feature pyramid network for multiscale object detection13
Generative adversarial networks for 2D-based CNN pose-invariant face recognition13
Cross-domain image retrieval: methods and applications13
Multi-scale object detection with feature enhancement for traffic scenes12
MFAFD: a few-shot learning method for cascading models with parameter free attention and finite discrete space12
An emotion-driven, transformer-based network for multimodal fake news detection12
Human action recognition using an optical flow-gated recurrent neural network11
Ultra fast-inference depth completion with linear attention-based cascaded hourglass network11
Study of Alzheimer’s disease brain impairment and methods for its early diagnosis: a comprehensive survey10
InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection10
Multi-view learning for camouflaged object detection with PVTv210
Concept-based and embedding-based models in lifelog retrieval: an empirical comparison of performance10
Weighted semantic feature based self-supervised deep cross-modal hashing10
Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks10
Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review10
A voting-based novel spatio-temporal fusion framework for video saliency using transfer learning mechanism9
Optical music recognition for homophonic scores with neural networks and synthetic music generation9
FOF: a fine-grained object detection and feature extraction end-to-end network9
Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval9
Stratified Graph Indexing for efficient search in deep descriptor databases8
Style-aware adversarial pairwise ranking for image recommendation systems8
MCDINO: Self-supervised learning of masks based on combination of multi-path channel attention and local feature weighting8
RGBD deep multi-scale network for background subtraction8
A literature review and perspectives in deepfakes: generation, detection, and applications7
Improving skeleton-based action recognition with interactive object information7
ETG: the graph convolutional network was enhanced with an EA-transformer for aspect sentiment triplet extraction7
TCKGE: Transformers with contrastive learning for knowledge graph embedding7
An interactive attribute-preserving fashion recommendation with 3D image-based virtual try-on7
Deep multimodal learning for time series analysis in social computing: a survey6
FDAM: full-dimension attention module for deep convolutional neural networks6
Who is gambling? Finding cryptocurrency gamblers using multi-modal retrieval methods6
Few-shot and meta-learning methods for image understanding: a survey6
A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content6
Joint multi-scale information and long-range dependence for video captioning5
Dual-feature collaborative relation-attention networks for visual question answering5
$$HF^{2}\text {-}Net$$: hybrid fine-tuning heterogeneous fusion network for visible-infrared person Re-identification5
CoCoOpter: Pre-train, prompt, and fine-tune the vision-language model for few-shot image classification5
DMFNet: geometric multi-scale pixel-level contrastive learning for video salient object detection5
Gender classification from face images using central difference convolutional networks4
Enhancing action recognition via dynamic cross-frame differential modeling4
Image forgery classification and localization through vision transformers4
ANROT-HELANet: adverserially and naturally robust attention-based aggregation network via the hellinger distance for few-shot classification4
Ornament image retrieval using few-shot learning4
Special Issue on Open-Domain Image Retrieval in the Wild4
Similar interior coordination image retrieval with multi-view features4
LG-MLFormer: local and global MLP for image captioning4
Partial multimodal hashing with multi-level semantics and adversarial learning4
Sentiment analysis using deep learning techniques: a comprehensive review4
Emotion-aware music tower blocks (EmoMTB ): an intelligent audiovisual interface for music discovery and recommendation4
Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking3
Parameter-efficient tuning of cross-modal retrieval for a specific database via trainable textual and visual prompts3
Global and local label-constrained alignment for image-text matching3
Remote Sensing Image Change Captioning: A Comprehensive Review3
Multi-aware coreference relation network for visual dialog3
Enhancing deep learning image classification using data augmentation and genetic algorithm-based optimization3
Dual-matrix guided reconstruction hashing for unsupervised cross-modal retrieval3
Deep multiple aggregation networks for action recognition3
A new CNN-based semantic object segmentation for autonomous vehicles in urban traffic scenes3
Cross-modal alignment with synthetic caption for text-based person search3
3D skeleton-based human motion prediction using spatial–temporal graph convolutional network3
A novel method for video shot boundary detection using CNN-LSTM approach3
Special issue on cross-modal retrieval and analysis3
CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval3
Text detection, recognition, and script identification in natural scene images: a Review3
0.26517415046692