Scientific Data

Papers
(The median citation count of Scientific Data is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)
ArticleCitations
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece1567
Author Correction: Mobility networks in Greater Mexico City650
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations634
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas553
Identifying Cocoa Flower Visitors: A Deep Learning Dataset399
Tsunami Runup Survey Data From The Taan Fjord Landslide Event382
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells373
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)373
An agenda for addressing bias in conflict data346
Linking Research Data with Physically Preserved Research Materials in Chemistry345
Chromosome-level genome assembly of the Rhizoctonia solani281
A database of steric and electronic properties of heteroaryl substituents279
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model240
Full Field Digital Mammography Dataset from a Population Screening Program222
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa202
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset187
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus178
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models175
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction170
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms169
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data154
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing149
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development136
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors135
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE131
A dataset of the daily edge of each polynya in the Antarctic131
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows130
Home monitoring with connected mobile devices for asthma attack prediction with machine learning119
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–2021116
Generating FAIR research data in experimental tribology115
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings113
An open dataset for oracle bone character recognition and decipherment113
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements113
China’s provincial process CO2 emissions from cement production during 1993–2019113
RailFOD23: A dataset for foreign object detection on railroad transmission lines111
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface111
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components110
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)108
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years107
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children106
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata105
Near-complete reference genome assembly of Hoya carnosa104
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology104
A Simulated Comprehensive Photon Flux Shielding Spectra Dataset for Advanced Radiation Safety Assessment101
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector101
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems100
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images98
Chromosome-level genome assembly of rock carp (Procypris rabaudi)95
Empowering open data sharing for social good: a privacy-aware approach95
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata94
Enrichment of lung cancer computed tomography collections with AI-derived annotations94
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics93
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa93
A focus groups study on data sharing and research data management92
PAVC: The foundation for a Pan-Arctic Vegetation Cover database92
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data90
A chromosome-scale assembly of Ormosia boluoensis (Fabaceae)89
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios85
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)85
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan84
Students’ performance dataset for using machine learning technique in physics education research81
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–202381
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland81
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes79
Slovak database of speech affected by neurodegenerative diseases78
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)77
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat77
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes77
Canopy height model and NAIP imagery pairs across CONUS75
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica74
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change74
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset74
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots74
An open-access database of nature-based carbon offset project boundaries73
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study73
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 202172
Statistical performance indicators and index—a new tool to measure country statistical capacity72
A global dataset of fossil fungi records from the Cenozoic71
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection71
Scaling up SoccerNet with multi-view spatial localization and re-identification71
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions71
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales71
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour71
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images70
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties70
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports70
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state68
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery67
Shotgun metagenomes from productive lakes in an urban region of Sweden67
An Enhanced Phenology Dataset for Global Drylands from 2001 to 201967
QMugs, quantum mechanical properties of drug-like molecules66
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers66
Machine learning-ready remote sensing data for Maya archaeology65
Author Correction: GERDA: The German Election Database65
The R package for DICOM to brain imaging data structure conversion65
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers65
An 8-model ensemble of CMIP6-derived ocean surface wave climate65
Multimodal Data for the Detection of Freezing of Gait in Parkinson’s Disease64
The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents63
Dataset on heavy metal pollution assessment in freshwater ecosystems63
A dataset for deep learning based detection of printed circuit board surface defect61
Analysis of AlphaMissense data in different protein groups and structural context61
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets61
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach60
A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland60
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data59
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper59
BUS-UCLM: Breast ultrasound lesion segmentation dataset59
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems58
A multilayered urban tree dataset of point clouds, quantitative structure and graph models58
Bioclimatic atlas of the terrestrial Arctic57
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research56
A Global Database of Soil Plant Available Phosphorus56
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 202056
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer55
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity54
Measurement of ship-generated waves in German coastal waterways from 1998–202253
Constructing a global human epidemic database using open-source digital biosurveillance53
The HAInich: A multidisciplinary vision data-set for a better understanding of the forest ecosystem53
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios53
An integrated multi-source dataset of elasmobranchs in the Red Sea following the Red Sea Decade Expedition52
PEARL-Neuro Database: EEG, fMRI, health and lifestyle data of middle-aged people at risk of dementia52
A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints52
Confocal imaging dataset to assess endothelial cell orientation during extreme glucose conditions52
Cognitive tasks, anatomical MRI, and functional MRI data evaluating the construct of self-regulation52
Author Correction: Geographical characterisation of British urban form and function using the spatial signatures framework52
Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research52
RecyBat24: a dataset for detecting lithium-ion batteries in electronic waste disposal51
Comprehensive curation and validation of genomic datasets for chestnut51
24-hour average PM2.5 concentration caused by aircraft in Chinese airports from Jan. 2006 to Dec. 202351
Chromosome-level genome assembly of the sap beetle Glischrochilus (Librodor) japonius (Coleoptera: Nitidulidae)51
A corpus and a modular infrastructure for the empirical study of (an)notated music51
The genome assembly and annotation of the cricket Gryllus longicercus51
A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset51
Observing the Central Arctic Atmosphere and Surface with University of Colorado uncrewed aircraft systems51
STInt: An integrated dataset covering science, technology and industry information in the pharmaceutical field51
Innovative molecular networking analysis of steroids and characterisation of the urinary steroidome50
A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae)50
Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset50
MiTra: A Drone-Based Trajectory Data for an All-Traffic-State Inclusive Freeway with Ramps50
The draft genome sequences of the cosmopolitan centric diatom, the genus Skeletonema50
Variation of winter wheat phenology dataset in Huang Huai Hai Plain of China from 1981 to 202150
Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels49
An East Antarctic, sub-annual resolution water isotope record from the Mount Brown South Ice core49
A database of in situ water temperatures for large inland lakes across the coterminous United States49
Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 201749
Coral community data Heron Island Great Barrier Reef 1962–201649
The W2024 database of the water isotopologue $${{\rm{H}}}_{2}^{\,16}{\rm{O}}$$48
A global dataset on mungbean for managing seed yield and quality48
A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running48
Bimodal electroencephalography-functional magnetic resonance imaging dataset for inner-speech recognition48
A curated dataset on the distribution of West Palaearctic freshwater bivalves48
Improved high quality sand fly assemblies enabled by ultra low input long read sequencing48
SignEEG v1.0: Multimodal Dataset with Electroencephalography and Hand-written Signature for Biometric Systems47
Hong Kong Corpus of Chinese Sentence and Passage Reading47
Sentinel-3 Altimetry Thematic Products for Hydrology, Sea Ice and Land Ice47
A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa)47
Haplotype-resolved chromosome-level genome assembly of Ehretia macrophylla47
A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations47
A chromosome-scale reference genome of grasspea (Lathyrus sativus)47
Chromosome-scale genome assembly and annotation of Xenocypris argentea47
Gap-free 16-year (2005–2020) sub-diurnal surface meteorological observations across Florida47
Ontology for the Avida digital evolution platform46
Publisher Correction: Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica46
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method45
A chromosome-level genome assembly of skipjack tuna, Katsuwonus pelamis (Perciformes: Scombridae)45
Detection of differential bait proteoforms through immunoprecipitation-mass spectrometry data analysis45
A dataset of riverine nitrogen yield across watersheds in the Conterminous United States45
OSMlanduse a dataset of European Union land use at 10 m resolution derived from OpenStreetMap and Sentinel-245
A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing45
Harmonized Database of Western U.S. Water Rights (HarDWR) v.145
A panel sequencing dataset of peripheral blood gene variations in pan-cancer45
A speech corpus of Quechua Collao for automatic dimensional emotion recognition45
An RNA-seq time series of the medaka pituitary gland during sexual maturation45
Data scheme and data format for transferable force fields for molecular simulation45
Contextualized race and ethnicity annotations for clinical text from MIMIC-III44
Haplotype-resolved T2T genome assembly of the pear cultivar ‘Danxiahong’44
A biologging database of juvenile white sharks from the northeast Pacific44
A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models44
Georectified polygon database of ground-mounted large-scale solar photovoltaic sites in the United States44
Renji endoscopic submucosal dissection video data set for colorectal neoplastic lesions44
Whole genome sequencing and structural variations provide insights into the body size traits of Hu sheep44
Historical dataset details the distribution, extent and form of lost Ostrea edulis reef ecosystems44
A benchmark database of ten years of prospective next-day earthquake forecasts in California from the Collaboratory for the Study of Earthquake Predictability43
Transcriptome profiling of mRNA and lncRNA involved in wax biosynthesis in cauliflower43
A multimodal dataset for coronary microvascular disease biomarker discovery43
Curated CYP450 Interaction Dataset: Covering the Majority of Phase I Drug Metabolism43
Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor42
An Observation-Based Dataset of Global Sub-Daily Precipitation Indices (GSDR-I)42
Multiorder hydrologic Position for Europe — a Set of Features for Machine Learning and Analysis in Hydrology42
NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images42
A benchmark GaoFen-7 dataset for building extraction from satellite images42
Soil carbon stock densities in mangrove and forested wetland ecosystems of Panama42
A dataset of eye gaze images for calibration-free eye tracking augmented reality headset42
Big data collection in pharmaceutical manufacturing and its use for product quality predictions42
Daily station-level records of air temperature, snow depth, and ground temperature in the Northern Hemisphere42
A benchmark for domain adaptation and generalization in smartphone-based human activity recognition41
The RESILIENT Dataset: Multimodal Monitoring of Ageing-Related Comorbidities and Cognitive Decline41
High resolution climate change observations and projections for the evaluation of heat-related extremes41
1.5 million materials narratives generated by chatbots41
Chromosome-level genome assembly of Odontothrips loti Haliday (Thysanoptera: Thripidae)41
A global dataset for steel aluminum and cement in-use stocks at 500 m gridded level 2000-201940
Distribution of soil macrofauna across different habitats in the Eastern European Alps40
A new high-resolution global topographic factor dataset calculated based on SRTM40
High-resolution ethograms, accelerometer recordings, and behavioral time series of Japanese quail40
Global hydro-environmental lake characteristics at high spatial resolution40
Accumulation-depuration data collection in support of toxicokinetic modelling40
Molecular structural dataset of lignin macromolecule elucidating experimental structural compositions40
High-Resolution Ultrasound Data for AI-Based Segmentation in Mouse Brain Tumor39
Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data39
A soil database from Queretaro, Mexico for assessment of crop and irrigation water requirements39
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy39
Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision39
Brightfield vs Fluorescent Staining Dataset–A Test Bed Image Set for Machine Learning based Virtual Staining39
Mapping Road Surface Type of Kenya Using OpenStreetMap and High-resolution Google Satellite Imagery39
Chromosome-level genome assembly of the Tyrrhenian tree frog (Hyla sarda)39
Non-coding RNA profiling in BRAFV600E-mutant cutaneous melanoma before and after Spry1 depletion39
Metagenomic sequencing and reconstruction of 82 microbial genomes from barley seed communities38
Dataset on child vaccination in Brazil from 1996 to 202138
AVDOS-VR: Affective Video Database with Physiological Signals and Continuous Ratings Collected Remotely in VR38
BIRAFFE2, a multimodal dataset for emotion-based personalization in rich affective game environments37
3D motion analysis dataset of healthy young adult volunteers walking and running on overground and treadmill37
Dataset of the suitability of major food crops in Africa under climate change37
An ageing study of twenty 18650 lithium-ion Graphite/LFP cells in first and second life use37
A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland37
Genome Skimming Reveals Genetic Diversity in 220 Papaver Individuals from China37
In vivo submillimeter diffusion MRI dataset of 9 macaque brains curated for tractography37
Three-dimensional chromatin architecture datasets for aging and Alzheimer’s disease37
A Chinese Face Dataset with Dynamic Expressions and Diverse Ages Synthesized by Deep Learning37
Combining citizen science data and literature to build a traits dataset of Taiwan’s birds37
A global dataset of terrestrial evapotranspiration and soil moisture dynamics from 1982 to 202037
AHAD: African major crops harvested area dataset for the years of 2000, 2010, and 202037
GriddingMachine, a database and software for Earth system modeling at global and regional scales36
An intra-annual 30-m dataset of small lakes of the Qilian Mountains for the period 1987–202036
Dataset of soil hydraulic parameters in the Yellow River Basin based on in situ deep sampling36
Surrounding road density of child care centers in Australia36
Mobility of Erasmus+ students in Europe: Geolocated individual and aggregate mobility flows from 2014 to 202236
Psilocybin’s acute and persistent brain effects: a precision imaging drug trial36
Global inventory of species categorized by known underwater sonifery36
Inventory of shallow landslides triggered by extreme precipitation in July 2023 in Beijing, China35
Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy35
Dual Radar: A Multi-modal Dataset with Dual 4D Radar for Autononous Driving35
Bias-corrected NESM3 global dataset for dynamical downscaling under 1.5 °C and 2 °C global warming scenarios35
An EEG Dataset of Neural Signatures in a Competitive Two-Player Game Encouraging Deceptive Behavior35
Impact factors for quantifying country-level terrestrial biodiversity intactness footprints (IBIF)35
A database with frailty, functional and inertial gait metrics for the research of fall causes in older adults35
An Experimental and Clinical Physiological Signal Dataset for Automated Pain Recognition35
MSPB: a longitudinal multi-sensor dataset with phenotypic trait measurements from honey bees35
Acting Emotions: a comprehensive dataset of elicited emotions35
LungHist700: A dataset of histological images for deep learning in pulmonary pathology35
0.15222406387329