OOIR: Observatory of International Research

Papers

(The TQCC of Bioinformatics is 16. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)

Article	Citations
The 2025 ISCB Accomplishments by a Senior Scientist Award—Dr Amos Bairoch	1802
DivPro: diverse protein sequence design with direct structure recovery guidance	1404
RVINN: a flexible modeling for inferring dynamic transcriptional and post-transcriptional regulation using physics-informed neural networks	871
LPTD: a novel linear programming-based topology determination method for cryo-EM maps	796
Integrated Genome Browser App Store	768
CondiS web app: imputation of censored lifetimes for machine learning-based survival analysis	477
deTELpy: Python package for high-throughput detection of amino acid substitutions in mass spectrometry datasets	348
Memory-efficient, accelerated protein interaction inference with blocked, multi-GPU D-SCRIPT	281
Statistical framework to determine indel-length distribution	235
CompareM2 is a genomes-to-report pipeline for comparing microbial genomes	219
MRDagent: iterative and adaptive parameter optimization for stable ctDNA-based MRD detection in heterogeneous samples	166
getDNB: identifying dynamic network biomarkers of hepatocellular carcinoma from time-varying gene regulations utilizing graph embedding techniques for anomaly detection	143
Mocafe: a comprehensive Python library for simulating cancer development with Phase Field Models	140
Correction to: GTExVisualizer: a web platform for supporting ageing studies	138
IntegrAlign: a comprehensive tool for multi-immunofluorescence panel integration through image alignment	133
ProteinLIPs: a web server for identifying highly polar and poorly packed interfaces in proteins	133
NOODAI: a webserver for network-oriented multi-omics data analysis and integration pipeline	127
Completing gene trees without species trees in sub-quadratic time	118
MAFFIN: metabolomics sample normalization using maximal density fold change with high-quality metabolic features and corrected signal intensities	111
PANPROVA: pangenomic prokaryotic evolution of full assemblies	111
Viral Diseases Explorer: a webtool to identify viral disease information derived from multiple LLMs	107
ATLIGATOR: editing protein interactions with an atlas-based approach	103
EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow	96
Icolos: a workflow manager for structure-based post-processing of de novo generated small molecules	94
TRANSDIRE: data-driven direct reprogramming by a pioneer factor-guided trans-omics approach	92

HelixGAN a deep-learning methodology for conditional de novo design of α-helix structures	90
Reconstructing tumor clonal lineage trees incorporating single-nucleotide variants, copy number alterations and structural variations	90
Increasing confidence in proteomic spectral deconvolution through mass defect	89
SimPlot++: a Python application for representing sequence similarity and detecting recombination	89
Accurate assembly of multiple RNA-seq samples with Aletsch	87
FastDup: a scalable duplicate marking tool using speculation-and-test mechanism	84
monaLisa: an R/Bioconductor package for identifying regulatory motifs	81
Cross-species prediction of essential genes in insects	78
Fragmentstein—facilitating data reuse for cell-free DNA fragment analysis	78
Response to the letter to the editor: On the feasibility of dynamical analysis of network models of biochemical regulation	78
insilicoSV: a flexible grammar-based framework for structural variant simulation and placement	77
Exploring automatic inconsistency detection for literature-based gene ontology annotation	76
Random field modeling of multi-trait multi-locus association for detecting methylation quantitative trait loci	74
MetBP: a software tool for detection of interaction between metal ion–RNA base pairs	73
skandiver: a divergence-based analysis tool for identifying intercellular mobile genetic elements	72
Harnessing deep learning for proteome-scale detection of amyloid signaling motifs	70
CANTATA—prediction of missing links in Boolean networks using genetic programming	69
ProSynAR: a reference aware read merger	69
The phers R package: using phenotype risk scores based on electronic health records to study Mendelian disease and rare genetic variants	68
The ENDS of assumptions: an online tool for the epistemic non-parametric drug–response scoring	68
Decomposing mosaic tandem repeats accurately from long reads	68
Deep Local Analysis deconstructs protein–protein interfaces and accurately estimates binding affinity changes upon mutation	68
MPBind: a multitask protein binding site predictor using protein language models and equivariant GNNs	67
FastSCODE: an accelerated SCODE algorithm for inferring gene regulatory networks on manycore processors	66
Aclust2.0: a revamped unsupervised R tool for Infinium methylation beadchips data analyses	66
WMDS.net: a network control framework for identifying key players in transcriptome programs	65
DRUMMER—rapid detection of RNA modifications through comparative nanopore sequencing	65
PyLiger: scalable single-cell multi-omic data integration in Python	65
Floria: fast and accurate strain haplotyping in metagenomes	63
GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks	62
Inference of 3D genome architecture by modeling overdispersion of Hi-C data	62
From high-throughput evaluation to wet-lab studies: advancing mutation effect prediction with a retrieval-enhanced model	62
Evidential meta-model for molecular property prediction	61
MICER: a pre-trained encoder–decoder architecture for molecular image captioning	60
The FASTQ+ format and PISA	60
AFragmenter: schema-free, tuneable protein domain segmentation for AlphaFold protein structures	59
TripLexicon: prediction and analysis of gene regulatory RNA–DNA interactions	59
Perceiver CPI: a nested cross-attention network for compound–protein interaction prediction	58
Estimation of cancer cell fractions and clone trees from multi-region sequencing of tumors	56
ADViSELipidomics: a workflow for analyzing lipidomics data	56
RNAsolo: a repository of cleaned PDB-derived RNA 3D structures	56
hapCon: estimating contamination of ancient genomes by copying from reference haplotypes	56
Group-walk: a rigorous approach to group-wise false discovery rate analysis by target-decoy competition	56
DeepPerVar: a multi-modal deep learning framework for functional interpretation of genetic variants in personal genome	55
`Oarfish`: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification	55
Deciphering high-order structures in spatial transcriptomes with graph-guided Tucker decomposition	54
Single-cell RNA sequencing data analysis based on non-uniformε−neighborhood network	54
Hierarchical reinforcement learning for automatic disease diagnosis	54
Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation	52
Delineating inter- and intra-antibody repertoire evolution with AntibodyForests	51

From viral evolution to spatial contagion: a biologically modulated Hawkes model	50
XSI—a genotype compression tool for compressive genomics in large biobanks	50
Prediction of gene co-expression from chromatin contacts with graph attention network	50
Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning	49
Columba: fast approximate pattern matching with optimized search schemes	49
EDTox: an R Shiny application to predict the endocrine disruption potential of compounds	49
COVID-19 Spread Mapper: a multi-resolution, unified framework and open-source tool	48
ViReMaShiny : an interactive application for analysis of viral recombination data	48
Adaptive digital tissue deconvolution	48
ShortCake: an integrated platform for efficient and reproducible single-cell analysis	47
Prediction and curation of missing biomedical identifier mappings with Biomappings	47
hipFG: high-throughput harmonization and integration pipeline for functional genomics data	47
LinkExplorer: predicting, explaining and exploring links in large biomedical knowledge graphs	47
Powerful molecule generation with simple ConvNet	47
vaRHC: an R package for semi-automation of variant classification in hereditary cancer genes according to ACMG/AMP and gene-specific ClinGen guidelines	46
CFAGO: cross-fusion of network and attributes based on attention mechanism for protein function prediction	46
Transfer learning for drug–target interaction prediction	46
High-sensitivity pattern discovery in large, paired multiomic datasets	46
BridgeDPI: a novel Graph Neural Network for predicting drug–protein interactions	46
ECCB2022: the 21st European Conference on Computational Biology	45
Correction of image distortion in large-field ssEM stitching by an unsupervised intermediate-space solving network	45
RawHummus: an R Shiny app for automated raw data quality control in metabolomics	45
Conformal inference for reliable single cell RNA-seq annotation	45
Omnibus and robust deconvolution scheme for bulk RNA sequencing data integrating multiple single-cell reference sets and prior biological knowledge	45
SL-Miner: a web server for mining evidence and prioritization of cancer-specific synthetic lethality	45
MS-Decipher: a user-friendly proteome database search software with an emphasis on deciphering the spectra of O-linked glycopeptides	44
BAV-LLPS: a database of bacterial, archaea, and virus liquid–liquid phase separation proteins	44
scGrapHiC: deep learning-based graph deconvolution for Hi-C using single cell gene expression	43
Single-cell mutation calling and phylogenetic tree reconstruction with loss and recurrence	41
Functional characterization of co-phosphorylation networks	41
Trustworthy causal biomarker discovery: a multiomics brain imaging genetics-based approach	41
Comprehensive comparison of two types of algorithm for circRNA detection from short-read RNA-Seq	39
Accessible, uniform protein property prediction with a scikit-learn based toolset AIDE	38
LimROTS: a hybrid method integrating empirical Bayes and reproducibility-optimized statistics for robust differential expression analysis	38
The cell as a token: high-dimensional geometry in language models and cell embeddings	38
Microbench: automated metadata management for systems biology benchmarking and reproducibility in Python	37
The minimizer Jaccard estimator is biased and inconsistent	37
PST-PRNA: prediction of RNA-binding sites using protein surface topography and deep learning	37
Cell type matching across species using protein embeddings and transfer learning	37
OMEN: network-based driver gene identification using mutual exclusivity	37
mHapTk: a comprehensive toolkit for the analysis of DNA methylation haplotypes	36
CProMG: controllable protein-oriented molecule generation with desired binding affinity and drug-like properties	36
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures	35
PltDB: a blood platelets-based gene expression database for disease investigation	35
PractiCPP: a deep learning approach tailored for extremely imbalanced datasets in cell-penetrating peptide prediction	35
Graph-theoretical prediction of biological modules in quaternary structures of large protein complexes	35
Tightly integrated multiomics-based deep tensor survival model for time-to-event prediction	34
Multi-level attention graph neural network based on co-expression gene modules for disease diagnosis and prognosis	34
Modified RNAs and predictions with the ViennaRNA Package	34
SpecieScan: semi-automated taxonomic identification of bone collagen peptides from MALDI-ToF-MS	33
Generating synthetic genotypes using diffusion models	33
A physics-informed neural SDE network for learning cellular dynamics from time-series scRNA-seq data	33
Quantifying and correcting slide-to-slide variation in multiplexed immunofluorescence images	33
PiLSL: pairwise interaction learning-based graph neural network for synthetic lethality prediction in human cancers	33
LOCAN: a python library for analyzing single-molecule localization microscopy data	32
RNA threading with secondary structure and sequence profile	32
Mining literature and pathway data to explore the relations of ketamine with neurotransmitters and gut microbiota using a knowledge-graph	32
The 2024 ISCB Overton Prize Award—Dr Martin Steinegger	32
AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes	32
Spectral clustering of single-cell multi-omics data on multilayer graphs	32
Conumee 2.0: enhanced copy-number variation analysis from DNA methylation arrays for humans and mice	32
2023 ISCB Overton Prize: Jingyi Jessica Li	32
Using the UK Biobank as a global reference of worldwide populations: application to measuring ancestry diversity from GWAS summary statistics	32
Efficient gradient boosting for prognostic biomarker discovery	32
Geometry-complete perceptron networks for 3D molecular graphs	32
Galaxy Helm chart: a standardized method for deploying production Galaxy servers	32
HAMPLE: deciphering TF-DNA binding mechanism in different cellular environments by characterizing higher-order nucleotide dependency	31
iSFun: an R package for integrative dimension reduction analysis	31
MSNet-4mC: learning effective multi-scale representations for identifying DNA N4-methylcytosine sites	31
An automated multi-modal graph-based pipeline for mouse genetic discovery	30
scHiCPTR: unsupervised pseudotime inference through dual graph refinement for single-cell Hi-C data	30
Prediction of recovery from multiple organ dysfunction syndrome in pediatric sepsis patients	30
Globally Accessible Distributed Data Sharing (GADDS): a decentralized FAIR platform to facilitate data sharing in the life sciences	30
Semi-supervised data-integrated feature importance enhances performance and interpretability of biological classification tasks	30
Multistage attention-based extraction and fusion of protein sequence and structural features for protein function prediction	30
scSGL: kernelized signed graph learning for single-cell gene regulatory network inference	30
Scbean: a python library for single-cell multi-omics data analysis	29
AbDiver: a tool to explore the natural antibody landscape to aid therapeutic design	29
`Forseti` : a mechanistic and predictive model of the splicing status of scRNA-seq reads	29
SCONCE: a method for profiling copy number alterations in cancer evolution using single-cell whole genome sequencing	29

statgenMPP: an R package implementing an IBD-based mixed model approach for QTL mapping in a wide range of multi-parent populations	29
MIAMI: mutual information-based analysis of multiplex imaging data	29
StructuralDPPIV: a novel deep learning model based on atom structure for predicting dipeptidyl peptidase-IV inhibitory peptides	29
Prediction of HIV sensitivity to monoclonal antibodies using aminoacid sequences and deep learning	29
Improving dictionary-based named entity recognition with deep learning	29
MIO: microRNA target analysis system for immuno-oncology	29
Phenotype prediction from single-cell RNA-seq data using attention-based neural networks	28
scanMiR: a biochemically based toolkit for versatile and efficient microRNA target prediction	28
ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning	28
PeakBot: machine-learning-based chromatographic peak picking	28
STAAR workflow: a cloud-based workflow for scalable and reproducible rare variant analysis	28
An approachable, flexible and practical machine learning workshop for biologists	28
MixingDTA: improved drug–target affinity prediction by extending mixup with guilt-by-association	27
HyperGraphs.jl: representing higher-order relationships in Julia	27
2022 ISCB Accomplishments by a Senior Scientist Award: Ron Shamir	27
A novel pipeline for computerized mouse spermatogenesis staging	27
PDMDA: predicting deep-level miRNA–disease associations with graph neural networks and sequence features	27
Nezzle: an interactive and programmable visualization of biological networks in Python	27
dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning	27
SPRISS: approximating frequentk-mers by sampling reads, and applications	27
The 2025 ISCB Overton Prize Award—Dr James Zou	27
A unified mediation analysis framework for integrative cancer proteogenomics with clinical outcomes	26
Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data	26
IMPACT: interpretable microbial phenotype analysis via microbial characteristic traits	26
AHoJ: rapid, tailored search and retrieval of apo and holo protein structures for user-defined ligands	26
Balancing complexity and clarity—towards clinician-ready antibiotic resistance prediction models	26
DeepProtein: deep learning library and benchmark for protein sequence learning	25
ELIXIR biovalidator for semantic validation of life science metadata	25
Foreign RNA spike-ins enable accurate allele-specific expression analysis at scale	25
ARTEMIS integrates autoencoders and Schrödinger Bridges to predict continuous dynamics of gene expression, cell population, and perturbation from time-series single-cell data	25
Driver gene detection through Bayesian network integration of mutation and expression profiles	25
Powerful and interpretable control of false discoveries in two-group differential expression studies	25
Polyphest: fast polyploid phylogeny estimation	25
Optimal phylogenetic reconstruction of insertion and deletion events	25
Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data	25
CellAnn: a comprehensive, super-fast, and user-friendly single-cell annotation web server	24
Determining epitope specificity of T-cell receptors with transformers	24
ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads	24
LoRA-DR-suite: adapted embeddings predict intrinsic and soft disorder from protein sequences	24
InterpolatedXY: a two-step strategy to normalize DNA methylation microarray data avoiding sex bias	24
CATH-ddG: towards robust mutation effect prediction on protein–protein interactions out of CATH homologous superfamily	24
CIBRA identifies genomic alterations with a system-wide impact on tumor biology	23
Bayesian inference of fitness landscapes via tree-structured branching processes	23
SNIKT: sequence-independent adapter identification and removal in long-read shotgun sequencing data	23
Towards a reproducible interactome: semantic-based detection of redundancies to unify protein–protein interaction databases	23
REUNION: transcription factor binding prediction and regulatory association inference from single-cell multi-omics data	23
Overcoming biases in causal inference of molecular interactions	23
ATHENA: analysis of tumor heterogeneity from spatial omics measurements	23
ConceptDrift: leveraging spatial, temporal and semantic evolution of biomedical concepts for hypothesis generation	23
Somatic mutation effects diffused over microRNA dysregulation	23
Expanding the coverage of spatial proteomics: a machine learning approach	23
SPEAR: Systematic ProtEin AnnotatoR	23
SEPA: signaling entropy-based algorithm to evaluate personalized pathway activation for survival analysis on pan-cancer data	23
SimBu : bias-aware simulation of bulk RNA-seq data with variable cell-type composition	23
Thermometer: a webserver to predict protein thermal stability	22
Pycallingcards: an integrated environment for visualizing, analyzing, and interpreting Calling Cards data	22
NFTest: automated testing of Nextflow pipelines	22
TopHap: rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity	22
BSDE: barycenter single-cell differential expression for case–control studies	22
TEspeX: consensus-specific quantification of transposable element expression preventing biases from exonized fragments	22
CODEX: COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations	22
Phylogenetic diversity statistics for all clades in a phylogeny	21
Looking at the BiG picture: incorporating bipartite graphs in drug response prediction	21
AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects	21
3D Optical Coherence Tomography image processing in BISCAP: characterization of biofilm structure and properties	21
Biological Random Walks: multi-omics integration for disease gene prioritization	21
Struct-f4: a Rcpp package for ancestry profile and population structure inference from f4-statistics	21
ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment	21
DDAffinity: predicting the changes in binding affinity of multiple point mutations using protein 3D structure	21
3D GAN image synthesis and dataset quality assessment for bacterial biofilm	21
PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction	20
Optimal sequencing budget allocation for trajectory reconstruction of single cells	20
Hierarchical modelling of microbial communities	20
metaboprep: an R package for preanalysis data description and processing	20
Strategies for robust, accurate, and generalizable benchmarking of drug discovery platforms	20
RiboGraph: an interactive visualization system for ribosome profiling data at read length resolution	20
EDGE COVID-19: a web platform to generate submission-ready genomes from SARS-CoV-2 sequencing efforts	20
GASTON-Mix: a unified model of spatial gradients and domains using spatial mixture-of-experts	20
NMRpQuant: an automated software for large scale urinary total protein quantification by one-dimensional 1H NMR profiles	20
SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph	20
Closing the computational biology ‘knowledge gap’: Spanish Wikipedia as a case study	20
Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph transformer	20
Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data	20
BACPI: a bi-directional attention neural network for compound–protein interaction and binding affinity prediction	20
A penalized linear mixed model with generalized method of moments estimators for complex phenotype prediction	20
Predicted structural proteome of Sphagnum divinum and proteome-scale annotation	20
Joint registration of multiple point clouds for fast particle fusion in localization microscopy	20
MolCL-SP: a multimodal contrastive learning framework with non-overlapping substructure perturbations for molecular property prediction	20
JBrowse Jupyter: a Python interface to JBrowse 2	20
Exploiting pretrained biochemical language models for targeted drug design	20
ODGI: understanding pangenome graphs	20
konnect2prot: a web application to explore the protein properties in a functional protein–protein interaction network	19
HiCARN: resolution enhancement of Hi-C data using cascading residual networks	19
MolMVC: Enhancing molecular representations for drug-related tasks through multi-view contrastive learning	19
GRUMB: a genome-resolved metagenomic framework for monitoring urban microbiomes and diagnosing pathogen risk	19