Scientific Data

Papers
(The H4-Index of Scientific Data is 68. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica955
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms487
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model430
Shotgun metagenomes from productive lakes in an urban region of Sweden349
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas310
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity286
Slovak database of speech affected by neurodegenerative diseases248
Author Correction: Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers245
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data240
Monitoring non-pharmaceutical public health interventions during the COVID-19 pandemic230
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change205
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data199
Directional wave buoy data measured near Campbell Island, New Zealand196
RNA-seq of peripheral blood mononuclear cells of congenital generalized lipodystrophy type 2 patients193
Author Correction: Mobility networks in Greater Mexico City188
The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents187
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa180
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece161
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes160
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images140
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers136
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations124
Bioclimatic atlas of the terrestrial Arctic124
Canopy height model and NAIP imagery pairs across CONUS114
Linking Research Data with Physically Preserved Research Materials in Chemistry112
Enrichment of lung cancer computed tomography collections with AI-derived annotations112
Empowering open data sharing for social good: a privacy-aware approach110
BUS-UCLM: Breast ultrasound lesion segmentation dataset103
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset100
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology99
Statistical performance indicators and index—a new tool to measure country statistical capacity98
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports94
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing93
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales91
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state91
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development88
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models88
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)87
The R package for DICOM to brain imaging data structure conversion87
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction86
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells86
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems86
A global dataset of fossil fungi records from the Cenozoic84
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets82
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach82
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection81
An agenda for addressing bias in conflict data80
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery80
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour78
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland78
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research78
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata76
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper75
A multilayered urban tree dataset of point clouds, quantitative structure and graph models73
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics73
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study73
A dataset of the daily edge of each polynya in the Antarctic73
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images72
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers71
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components71
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions71
Home monitoring with connected mobile devices for asthma attack prediction with machine learning70
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 202170
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)69
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children69
Analysis of AlphaMissense data in different protein groups and structural context68
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows68
Generating FAIR research data in experimental tribology68
0.085422039031982