Bioinformatics

Papers
(The TQCC of Bioinformatics is 13. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)
ArticleCitations
DivPro: diverse protein sequence design with direct structure recovery guidance2180
RVINN: a flexible modeling for inferring dynamic transcriptional and post-transcriptional regulation using physics-informed neural networks1807
CondiS web app: imputation of censored lifetimes for machine learning-based survival analysis1029
Correction to: GTExVisualizer: a web platform for supporting ageing studies621
ProteinLIPs: a web server for identifying highly polar and poorly packed interfaces in proteins408
IntegrAlign: a comprehensive tool for multi-immunofluorescence panel integration through image alignment344
NOODAI: a webserver for network-oriented multi-omics data analysis and integration pipeline222
deTELpy: Python package for high-throughput detection of amino acid substitutions in mass spectrometry datasets176
Memory-efficient, accelerated protein interaction inference with blocked, multi-GPU D-SCRIPT142
CompareM2 is a genomes-to-report pipeline for comparing microbial genomes134
MRDagent: iterative and adaptive parameter optimization for stable ctDNA-based MRD detection in heterogeneous samples122
Viral Diseases Explorer: a webtool to identify viral disease information derived from multiple LLMs103
Mixtum: a graphical tool for two-way admixture analysis in population genetics based on f -statistics103
Accurate assembly of multiple RNA-seq samples with Aletsch103
FastDup: a scalable duplicate marking tool using speculation-and-test mechanism98
FracFixR: a compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data93
From genes to trajectories: mapping genetic influences on Huntington’s disease progression86
getDNB: identifying dynamic network biomarkers of hepatocellular carcinoma from time-varying gene regulations utilizing graph embedding techniques for anomaly detection85
MCOAN: multimodal contrastive representation learning for cross-omics adaptive disease regulatory network prediction84
Statistical framework to determine indel-length distribution79
ATLIGATOR: editing protein interactions with an atlas-based approach78
EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow78
Icolos: a workflow manager for structure-based post-processing of de novo generated small molecules77
The 2025 ISCB Accomplishments by a Senior Scientist Award—Dr Amos Bairoch76
HelixGAN a deep-learning methodology for conditional de novo design of α-helix structures76
Reconstructing tumor clonal lineage trees incorporating single-nucleotide variants, copy number alterations and structural variations75
Mocafe: a comprehensive Python library for simulating cancer development with Phase Field Models74
Increasing confidence in proteomic spectral deconvolution through mass defect72
MDCompress: better, faster compression of molecular dynamics simulation trajectories69
Refining sequence-to-expression modelling with chromatin accessibility69
Diagnosing scientific replicability through probabilistic distinguishability68
NAViFluX: a visualization‑centric platform for interactive analysis, refinement and design of genome‑scale metabolic networks68
FUSE: data-driven functional segmentation of DNA methylation data68
CodonMoE: DNA language models for codon-dependent mRNA prediction64
HKD-CPI: high-order knowledge distillation enhanced inductive compound-protein interaction prediction62
Estimation of cancer cell fractions and clone trees from multi-region sequencing of tumors61
The phers R package: using phenotype risk scores based on electronic health records to study Mendelian disease and rare genetic variants59
Group-walk: a rigorous approach to group-wise false discovery rate analysis by target-decoy competition59
Inference of 3D genome architecture by modeling overdispersion of Hi-C data58
Decomposing mosaic tandem repeats accurately from long reads57
Deep Local Analysis deconstructs protein–protein interfaces and accurately estimates binding affinity changes upon mutation57
RNAsolo: a repository of cleaned PDB-derived RNA 3D structures57
Exploring automatic inconsistency detection for literature-based gene ontology annotation56
From high-throughput evaluation to wet-lab studies: advancing mutation effect prediction with a retrieval-enhanced model56
Random field modeling of multi-trait multi-locus association for detecting methylation quantitative trait loci56
MetBP: a software tool for detection of interaction between metal ion–RNA base pairs55
Harnessing deep learning for proteome-scale detection of amyloid signaling motifs54
CANTATA—prediction of missing links in Boolean networks using genetic programming54
FastSCODE: an accelerated SCODE algorithm for inferring gene regulatory networks on manycore processors52
Evidential meta-model for molecular property prediction51
skandiver: a divergence-based analysis tool for identifying intercellular mobile genetic elements49
MPBind: a multitask protein binding site predictor using protein language models and equivariant GNNs49
MICER: a pre-trained encoder–decoder architecture for molecular image captioning48
Fragmentstein—facilitating data reuse for cell-free DNA fragment analysis47
DeepPerVar: a multi-modal deep learning framework for functional interpretation of genetic variants in personal genome46
ADViSELipidomics: a workflow for analyzing lipidomics data45
phylobar: an R package for multiresolution compositional barplots in omics studies44
SA2E: spatial-aware auto-encoder for cell type deconvolution of spatial transcriptomics data44
TripLexicon: prediction and analysis of gene regulatory RNA–DNA interactions43
scSurv: a deep generative model for single-cell survival analysis43
Perceiver CPI: a nested cross-attention network for compound–protein interaction prediction42
hapCon: estimating contamination of ancient genomes by copying from reference haplotypes42
The FASTQ+ format and PISA42
AFragmenter: schema-free, tuneable protein domain segmentation for AlphaFold protein structures41
WMDS.net: a network control framework for identifying key players in transcriptome programs41
insilicoSV: a flexible grammar-based framework for structural variant simulation and placement41
Finding low-complexity DNA sequences with longdust41
Aclust2.0: a revamped unsupervised R tool for Infinium methylation beadchips data analyses41
StrucPTM: a database of structurally validated protein modifications and their conformational variation40
Floria: fast and accurate strain haplotyping in metagenomes39
A novel method for across-chromosome phasing without relative data39
STAR-GO: improving protein function prediction by learning to hierarchically integrate ontology-informed semantic embeddings37
Oarfish: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification37
Deciphering high-order structures in spatial transcriptomes with graph-guided Tucker decomposition37
Delineating inter- and intra-antibody repertoire evolution with AntibodyForests36
vaRHC: an R package for semi-automation of variant classification in hereditary cancer genes according to ACMG/AMP and gene-specific ClinGen guidelines36
Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation36
BrainConnect: processing brain connectivity and spatial transcriptomics data for integrative analysis36
Transfer learning for drug–target interaction prediction36
Prediction of gene co-expression from chromatin contacts with graph attention network36
XSI—a genotype compression tool for compressive genomics in large biobanks36
EXPLANA: a user-friendly workflow for EXPLoratory ANAlysis and feature selection in cross-sectional and longitudinal microbiome studies36
hipFG: high-throughput harmonization and integration pipeline for functional genomics data35
Prediction and curation of missing biomedical identifier mappings with Biomappings35
ViReMaShiny : an interactive application for analysis of viral recombination data35
VDJ-Insights: simplifying the annotation of genomic immunoglobulin and T cell receptor regions34
Adaptive digital tissue deconvolution34
CFAGO: cross-fusion of network and attributes based on attention mechanism for protein function prediction33
Functional lipid analysis via index-based lipidomics profile: a new computational module in LipidOne33
RAREsim2: flexible simulation of rare variant genetic data using real haplotypes31
Singletrack: an algorithm for improving memory consumption and performance of gap-affine sequence alignment31
Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning30
High-sensitivity pattern discovery in large, paired multiomic datasets30
Columba: fast approximate pattern matching with optimized search schemes30
ShortCake: an integrated platform for efficient and reproducible single-cell analysis30
Hierarchical reinforcement learning for automatic disease diagnosis30
LimROTS: a hybrid method integrating empirical Bayes and reproducibility-optimized statistics for robust differential expression analysis29
ECCB2022: the 21st European Conference on Computational Biology29
Conformal inference for reliable single cell RNA-seq annotation29
AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes29
Omnibus and robust deconvolution scheme for bulk RNA sequencing data integrating multiple single-cell reference sets and prior biological knowledge29
Correction of image distortion in large-field ssEM stitching by an unsupervised intermediate-space solving network29
Accessible, uniform protein property prediction with a scikit-learn based toolset AIDE29
SL-Miner: a web server for mining evidence and prioritization of cancer-specific synthetic lethality29
CProMG: controllable protein-oriented molecule generation with desired binding affinity and drug-like properties29
Trustworthy causal biomarker discovery: a multiomics brain imaging genetics-based approach28
Microbench: automated metadata management for systems biology benchmarking and reproducibility in Python28
Dogme: a nextflow pipeline for reprocessing nanopore RNA and DNA modifications28
The cell as a token: high-dimensional geometry in language models and cell embeddings28
Improving biomedical entity linking with generative relevance feedback27
Using semantic search to find publicly available gene-expression datasets27
mHapTk: a comprehensive toolkit for the analysis of DNA methylation haplotypes27
Generating synthetic genotypes using diffusion models26
Mining literature and pathway data to explore the relations of ketamine with neurotransmitters and gut microbiota using a knowledge-graph26
Prediction of bacterial protein–compound interactions with only positive samples26
The minimizer Jaccard estimator is biased and inconsistent26
Graph-theoretical prediction of biological modules in quaternary structures of large protein complexes26
SpecieScan: semi-automated taxonomic identification of bone collagen peptides from MALDI-ToF-MS26
RNA threading with secondary structure and sequence profile25
PractiCPP: a deep learning approach tailored for extremely imbalanced datasets in cell-penetrating peptide prediction25
Geometry-complete perceptron networks for 3D molecular graphs25
Cell type matching across species using protein embeddings and transfer learning25
PiLSL: pairwise interaction learning-based graph neural network for synthetic lethality prediction in human cancers25
Modified RNAs and predictions with the ViennaRNA Package24
Correction to: Enhancing interpretation of clinical disease-associated copy number variations from multiple sequencing strategies with CNVSeeker24
nf-core/viralmetagenome: A novel pipeline for untargeted viral genome reconstruction24
Spectral clustering of single-cell multi-omics data on multilayer graphs24
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures24
Conumee 2.0: enhanced copy-number variation analysis from DNA methylation arrays for humans and mice24
Duplex-Indel: a Snakemake pipeline for somatic Indel calling in Tn5 transposase-based duplex sequencing data24
BAV-LLPS: a database of bacterial, archaea, and virus liquid–liquid phase separation proteins23
The 2024 ISCB Overton Prize Award—Dr Martin Steinegger23
HAMPLE: deciphering TF-DNA binding mechanism in different cellular environments by characterizing higher-order nucleotide dependency23
2023 ISCB Overton Prize: Jingyi Jessica Li23
A physics-informed neural SDE network for learning cellular dynamics from time-series scRNA-seq data23
PULPO: pipeline of understanding large-scale patterns of oncogenomic signatures23
Single-cell mutation calling and phylogenetic tree reconstruction with loss and recurrence23
MIAMI: mutual information-based analysis of multiplex imaging data23
Functional characterization of co-phosphorylation networks23
scGrapHiC: deep learning-based graph deconvolution for Hi-C using single cell gene expression23
Galaxy Helm chart: a standardized method for deploying production Galaxy servers23
MSNet-4mC: learning effective multi-scale representations for identifying DNA N4-methylcytosine sites23
MIO: microRNA target analysis system for immuno-oncology22
Semi-supervised data-integrated feature importance enhances performance and interpretability of biological classification tasks22
Scbean: a python library for single-cell multi-omics data analysis22
Improving dictionary-based named entity recognition with deep learning22
An approachable, flexible and practical machine learning workshop for biologists22
Prediction of recovery from multiple organ dysfunction syndrome in pediatric sepsis patients22
Prediction of HIV sensitivity to monoclonal antibodies using aminoacid sequences and deep learning22
StructuralDPPIV: a novel deep learning model based on atom structure for predicting dipeptidyl peptidase-IV inhibitory peptides22
Phenotype prediction from single-cell RNA-seq data using attention-based neural networks22
Multistage attention-based extraction and fusion of protein sequence and structural features for protein function prediction22
SpatialRNA: a Python package for easy application of Graph Neural Network models on single-molecule spatial transcriptomics dataset22
CCC-GPU: a graphics processing unit (GPU)-accelerated nonlinear correlation coefficient for large-scale transcriptomic analyses22
Forseti : a mechanistic and predictive model of the splicing status of scRNA-seq reads22
TaxTriage: an open-source metagenomic sequencing data analysis pipeline enabling putative pathogen detection22
DeepProtein: deep learning library and benchmark for protein sequence learning21
HyperGraphs.jl: representing higher-order relationships in Julia21
DeepLMI: deep feature mining with a globally enhanced graph convolutional network for robust lncRNA–miRNA interaction prediction21
statgenMPP: an R package implementing an IBD-based mixed model approach for QTL mapping in a wide range of multi-parent populations21
The 2025 ISCB Overton Prize Award—Dr James Zou21
Polyphest: fast polyploid phylogeny estimation21
LoRA-DR-suite: adapted embeddings predict intrinsic and soft disorder from protein sequences21
Foreign RNA spike-ins enable accurate allele-specific expression analysis at scale21
treestructure: an R package to detect population structure in phylogenetic trees21
2022 ISCB Accomplishments by a Senior Scientist Award: Ron Shamir21
Powerful and interpretable control of false discoveries in two-group differential expression studies21
FishFeats: streamlined quantification of multimodal labeling at the single-cell level in 3D tissues21
Integrating curation into scientific publishing to train AI models21
scHiCPTR: unsupervised pseudotime inference through dual graph refinement for single-cell Hi-C data21
Balancing complexity and clarity—towards clinician-ready antibiotic resistance prediction models21
A novel pipeline for computerized mouse spermatogenesis staging21
ARTEMIS integrates autoencoders and Schrödinger Bridges to predict continuous dynamics of gene expression, cell population, and perturbation from time-series single-cell data20
CATH-ddG: towards robust mutation effect prediction on protein–protein interactions out of CATH homologous superfamily20
A unified mediation analysis framework for integrative cancer proteogenomics with clinical outcomes20
Determining epitope specificity of T-cell receptors with transformers20
CellAnn: a comprehensive, super-fast, and user-friendly single-cell annotation web server20
Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data20
Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data20
dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning20
TSEDTA: a transformer-based neural network with SMILES transformer and ESM2 embeddings for drug-target binding affinity prediction20
AHoJ: rapid, tailored search and retrieval of apo and holo protein structures for user-defined ligands20
Optimal phylogenetic reconstruction of insertion and deletion events20
Managing workflow executions with WESkit20
Learning drug synergy through environment-conditioned feature modulation20
MixingDTA: improved drug–target affinity prediction by extending mixup with guilt-by-association20
MegaPX: fast and space-efficient peptide assignment method using IBF-based multi-indexing20
IMPACT: interpretable microbial phenotype analysis via microbial characteristic traits20
InterpolatedXY: a two-step strategy to normalize DNA methylation microarray data avoiding sex bias20
SPEAR: Systematic ProtEin AnnotatoR19
Expanding the coverage of spatial proteomics: a machine learning approach19
SNIKT: sequence-independent adapter identification and removal in long-read shotgun sequencing data19
Biological Random Walks: multi-omics integration for disease gene prioritization19
Semantic-enhanced heterogeneous graph learning for identifying ncRNAs associated with drug resistance19
ConceptDrift: leveraging spatial, temporal and semantic evolution of biomedical concepts for hypothesis generation19
3D GAN image synthesis and dataset quality assessment for bacterial biofilm19
Somatic mutation effects diffused over microRNA dysregulation19
RISK: a next-generation tool for biological network annotation and visualization19
Integrating plant phenotypic and genotypic data in the AGENT project: a BrAPI service implementation19
SimBu : bias-aware simulation of bulk RNA-seq data with variable cell-type composition19
sedimix : a workflow for the analysis of hominin nuclear DNA sequences from sediments19
CIBRA identifies genomic alterations with a system-wide impact on tumor biology19
REUNION: transcription factor binding prediction and regulatory association inference from single-cell multi-omics data19
Phylogenetic diversity statistics for all clades in a phylogeny18
Pycallingcards: an integrated environment for visualizing, analyzing, and interpreting Calling Cards data18
DDAffinity: predicting the changes in binding affinity of multiple point mutations using protein 3D structure18
Accurate SPARQL generation via in-context learning and schema-based query construction18
Closing the computational biology ‘knowledge gap’: Spanish Wikipedia as a case study18
NFTest: automated testing of Nextflow pipelines18
ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment18
CMAtlas: a comprehensive DNA methylation atlas for exploring epigenetic alterations in 34 human cancer types18
3D Optical Coherence Tomography image processing in BISCAP: characterization of biofilm structure and properties18
AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects18
Bayesian inference of fitness landscapes via tree-structured branching processes18
Looking at the BiG picture: incorporating bipartite graphs in drug response prediction18
RiboGraph: an interactive visualization system for ribosome profiling data at read length resolution18
TEspeX: consensus-specific quantification of transposable element expression preventing biases from exonized fragments18
CODEX: COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations18
Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data18
Cleanifier: contamination removal from microbial sequences using spaced seeds of a human pangenome index17
Predicted structural proteome of Sphagnum divinum and proteome-scale annotation17
Efficient algorithms for simulating sequences along a phylogenetic tree17
SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph17
Manifold classification of neuron types from microscopic images17
MolCL-SP: a multimodal contrastive learning framework with non-overlapping substructure perturbations for molecular property prediction17
A penalized linear mixed model with generalized method of moments estimators for complex phenotype prediction17
Strategies for robust, accurate, and generalizable benchmarking of drug discovery platforms17
TARO: tree-aggregated factor regression for microbiome data integration17
JBrowse Jupyter: a Python interface to JBrowse 217
Optimal sequencing budget allocation for trajectory reconstruction of single cells17
PERSEUS: an interactive and intuitive web-based tool for pedigree visualization17
konnect2prot: a web application to explore the protein properties in a functional protein–protein interaction network17
CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS17
GRUMB: a genome-resolved metagenomic framework for monitoring urban microbiomes and diagnosing pathogen risk17
PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction16
M-Ionic: prediction of metal-ion-binding sites from sequence using residue embeddings16
MolMVC: Enhancing molecular representations for drug-related tasks through multi-view contrastive learning16
GASTON-Mix: a unified model of spatial gradients and domains using spatial mixture-of-experts16
NMRpQuant: an automated software for large scale urinary total protein quantification by one-dimensional 1H NMR profiles16
GAN-based data augmentation for transcriptomics: survey and comparative assessment16
Hierarchical modelling of microbial communities16
FlowDock: Geometric flow matching for generative protein–ligand docking and affinity prediction16
A deep learning framework for comprehensive prediction of human RNA G-quadruplex-binding proteins16
G4STAB: a multi-input deep learning model to predict G-quadruplex thermodynamic stability based on sequence and salt concentration16
Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph transformer16
dAMN: a genome-scale neural-mechanistic hybrid model to predict bacterial growth dynamics16
Attentive Variational Information Bottleneck for TCR–peptide interaction prediction15
Batch alignment via retention orders for preprocessing large-scale multi-batch LC-MS experiments15
SERAPHIM 2.0: an extended toolbox for studying phylogenetically informed movements15
A Boolean algebra for genetic variants15
0.13008785247803