Scientific Data

Papers
(The median citation count of Scientific Data is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)
ArticleCitations
Shotgun metagenomes from productive lakes in an urban region of Sweden1256
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas563
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity528
Slovak database of speech affected by neurodegenerative diseases419
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data354
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data344
Directional wave buoy data measured near Campbell Island, New Zealand306
RNA-seq of peripheral blood mononuclear cells of congenital generalized lipodystrophy type 2 patients303
Author Correction: Mobility networks in Greater Mexico City287
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa286
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers279
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece279
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations242
Linking Research Data with Physically Preserved Research Materials in Chemistry234
Empowering open data sharing for social good: a privacy-aware approach205
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 2020189
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data154
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models144
Chromosome-level genome assembly of the Rhizoctonia solani144
The R package for DICOM to brain imaging data structure conversion141
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)137
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells137
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction129
A global dataset of fossil fungi records from the Cenozoic127
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets124
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach124
A dataset of the daily edge of each polynya in the Antarctic113
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors109
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers104
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset104
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper103
Statistical performance indicators and index—a new tool to measure country statistical capacity103
Multimodal Data for the Detection of Freezing of Gait in Parkinson’s Disease102
An Enhanced Phenology Dataset for Global Drylands from 2001 to 2019101
Students’ performance dataset for using machine learning technique in physics education research99
A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland98
Near-complete reference genome assembly of Hoya carnosa98
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements97
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows97
PAVC: The foundation for a Pan-Arctic Vegetation Cover database94
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings94
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector92
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components89
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset89
China’s provincial process CO2 emissions from cement production during 1993–201988
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images88
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change88
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour87
Generating FAIR research data in experimental tribology87
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland85
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics85
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata84
An open-access database of nature-based carbon offset project boundaries83
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa82
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–202382
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions81
Machine learning-ready remote sensing data for Maya archaeology81
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)79
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes79
Author Correction: Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers78
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports77
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery76
Canopy height model and NAIP imagery pairs across CONUS76
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat76
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan75
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model75
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)74
Scaling up SoccerNet with multi-view spatial localization and re-identification74
Monitoring non-pharmaceutical public health interventions during the COVID-19 pandemic73
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes73
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales72
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE72
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images71
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing71
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots71
Chromosome-level genome assembly of rock carp (Procypris rabaudi)70
BUS-UCLM: Breast ultrasound lesion segmentation dataset69
Dataset on heavy metal pollution assessment in freshwater ecosystems68
An open dataset for oracle bone character recognition and decipherment67
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research64
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties64
Bioclimatic atlas of the terrestrial Arctic64
The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents63
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata63
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state63
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)62
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years62
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children61
RailFOD23: A dataset for foreign object detection on railroad transmission lines61
An agenda for addressing bias in conflict data61
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development61
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–202159
A focus groups study on data sharing and research data management59
A multilayered urban tree dataset of point clouds, quantitative structure and graph models59
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology58
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus58
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica58
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer58
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study58
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection57
An 8-model ensemble of CMIP6-derived ocean surface wave climate57
Identifying Cocoa Flower Visitors: A Deep Learning Dataset57
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios57
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface57
Chinese environmentally extended input-output database for 2017 and 201856
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 202156
Enrichment of lung cancer computed tomography collections with AI-derived annotations56
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems56
Home monitoring with connected mobile devices for asthma attack prediction with machine learning55
Tsunami Runup Survey Data From The Taan Fjord Landslide Event54
A database of steric and electronic properties of heteroaryl substituents54
A dataset for deep learning based detection of printed circuit board surface defect54
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms53
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems53
QMugs, quantum mechanical properties of drug-like molecules53
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios52
Analysis of AlphaMissense data in different protein groups and structural context52
A Global Database of Soil Plant Available Phosphorus52
Measurement of ship-generated waves in German coastal waterways from 1998–202251
Constructing a global human epidemic database using open-source digital biosurveillance51
Global photovoltaic solar panel dataset from 2019 to 202250
A database of in situ water temperatures for large inland lakes across the coterminous United States50
Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset50
Mobility of Erasmus+ students in Europe: Geolocated individual and aggregate mobility flows from 2014 to 202250
Chromosome-scale genome assembly and annotation of Xenocypris argentea50
Metagenomic sequencing and reconstruction of 82 microbial genomes from barley seed communities50
Accumulation-depuration data collection in support of toxicokinetic modelling49
A Chinese Face Dataset with Dynamic Expressions and Diverse Ages Synthesized by Deep Learning49
A global dataset on mungbean for managing seed yield and quality49
A multi-site, multi-modal travelling-heads resource for brain MRI harmonisation49
Ontology for the Avida digital evolution platform49
Dataset on child vaccination in Brazil from 1996 to 202148
OSMlanduse a dataset of European Union land use at 10 m resolution derived from OpenStreetMap and Sentinel-248
A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa)48
Bias-corrected NESM3 global dataset for dynamical downscaling under 1.5 °C and 2 °C global warming scenarios48
Haplotype-resolved chromosome-level genome assembly of Ehretia macrophylla48
Sentinel-3 Altimetry Thematic Products for Hydrology, Sea Ice and Land Ice48
A chromosome-scale reference genome of grasspea (Lathyrus sativus)47
SignEEG v1.0: Multimodal Dataset with Electroencephalography and Hand-written Signature for Biometric Systems47
Combining citizen science data and literature to build a traits dataset of Taiwan’s birds47
Contextualized race and ethnicity annotations for clinical text from MIMIC-III47
A database of mapped global fishing activity 1950–201747
A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations46
Hong Kong Corpus of Chinese Sentence and Passage Reading46
Gap-free 16-year (2005–2020) sub-diurnal surface meteorological observations across Florida46
High-resolution ethograms, accelerometer recordings, and behavioral time series of Japanese quail46
Spatial transcriptome profiling of normal human liver46
Surrounding road density of child care centers in Australia46
Daily precipitation dataset at 0.1° for the Yarlung Zangbo River basin from 2001 to 201545
Bimodal electroencephalography-functional magnetic resonance imaging dataset for inner-speech recognition45
Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 201744
The W2024 database of the water isotopologue $${{\rm{H}}}_{2}^{\,16}{\rm{O}}$$44
Daily station-level records of air temperature, snow depth, and ground temperature in the Northern Hemisphere44
A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running44
MiTra: A Drone-Based Trajectory Data for an All-Traffic-State Inclusive Freeway with Ramps44
A curated dataset on the distribution of West Palaearctic freshwater bivalves44
In vivo submillimeter diffusion MRI dataset of 9 macaque brains curated for tractography44
A biologging database of juvenile white sharks from the northeast Pacific43
1.5 million materials narratives generated by chatbots43
Confocal imaging dataset to assess endothelial cell orientation during extreme glucose conditions43
Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research43
Global nature run data with realistic high-resolution carbon weather for the year of the Paris Agreement43
Publisher Correction: Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica42
The HAInich: A multidisciplinary vision data-set for a better understanding of the forest ecosystem42
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method42
Author Correction: Geographical characterisation of British urban form and function using the spatial signatures framework42
A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing42
Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data41
A benchmark for domain adaptation and generalization in smartphone-based human activity recognition41
Innovative molecular networking analysis of steroids and characterisation of the urinary steroidome41
A combined microbial and biogeochemical dataset from high-latitude ecosystems with respect to methane cycle41
Coral community data Heron Island Great Barrier Reef 1962–201641
A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints41
An integrated multi-source dataset of elasmobranchs in the Red Sea following the Red Sea Decade Expedition41
A speech corpus of Quechua Collao for automatic dimensional emotion recognition40
Detection of differential bait proteoforms through immunoprecipitation-mass spectrometry data analysis40
Harmonized Database of Western U.S. Water Rights (HarDWR) v.140
ValLAI_Crop, a validation dataset for coarse-resolution satellite LAI products over Chinese cropland40
A panel sequencing dataset of peripheral blood gene variations in pan-cancer39
A dataset of riverine nitrogen yield across watersheds in the Conterminous United States39
Sharkipedia: a curated open access database of shark and ray life history traits and abundance time-series39
Haplotype-resolved T2T genome assembly of the pear cultivar ‘Danxiahong’39
A multimodal dataset for coronary microvascular disease biomarker discovery39
Data scheme and data format for transferable force fields for molecular simulation39
COFACTOR Drammen dataset - 4 years of hourly energy use data from 45 public buildings in Drammen, Norway39
3D motion analysis dataset of healthy young adult volunteers walking and running on overground and treadmill39
Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy39
Chromosome-level genome assembly of Odontothrips loti Haliday (Thysanoptera: Thripidae)39
A chromosome-level genome assembly of skipjack tuna, Katsuwonus pelamis (Perciformes: Scombridae)39
An RNA-seq time series of the medaka pituitary gland during sexual maturation39
A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models39
Improved high quality sand fly assemblies enabled by ultra low input long read sequencing39
An East Antarctic, sub-annual resolution water isotope record from the Mount Brown South Ice core39
A multi-year campus-level smart meter database39
Distribution of soil macrofauna across different habitats in the Eastern European Alps38
Chromosome-level genome assembly of the sap beetle Glischrochilus (Librodor) japonius (Coleoptera: Nitidulidae)38
A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset38
Comprehensive curation and validation of genomic datasets for chestnut38
A corpus and a modular infrastructure for the empirical study of (an)notated music38
RecyBat24: a dataset for detecting lithium-ion batteries in electronic waste disposal38
The genome assembly and annotation of the cricket Gryllus longicercus37
Historical dataset details the distribution, extent and form of lost Ostrea edulis reef ecosystems37
Inventory of shallow landslides triggered by extreme precipitation in July 2023 in Beijing, China37
24-hour average PM2.5 concentration caused by aircraft in Chinese airports from Jan. 2006 to Dec. 202337
ROBIN: Reference observatory of basins for international hydrological climate change detection37
Acting Emotions: a comprehensive dataset of elicited emotions37
An intra-annual 30-m dataset of small lakes of the Qilian Mountains for the period 1987–202037
STInt: An integrated dataset covering science, technology and industry information in the pharmaceutical field37
AVDOS-VR: Affective Video Database with Physiological Signals and Continuous Ratings Collected Remotely in VR37
MSPB: a longitudinal multi-sensor dataset with phenotypic trait measurements from honey bees36
Georectified polygon database of ground-mounted large-scale solar photovoltaic sites in the United States36
Discrete typing units of Trypanosoma cruzi: Geographical and biological distribution in the Americas36
Psilocybin’s acute and persistent brain effects: a precision imaging drug trial36
Brightfield vs Fluorescent Staining Dataset–A Test Bed Image Set for Machine Learning based Virtual Staining36
GriddingMachine, a database and software for Earth system modeling at global and regional scales36
Characterization of hormone-producing cell types in the teleost pituitary gland using single-cell RNA-seq35
ReaLSAT, a global dataset of reservoir and lake surface area variations35
An ageing study of twenty 18650 lithium-ion Graphite/LFP cells in first and second life use35
PEARL-Neuro Database: EEG, fMRI, health and lifestyle data of middle-aged people at risk of dementia35
3DSC - a dataset of superconductors including crystal structures35
A hierarchical inventory of the world’s mountains for global comparative mountain science35
Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision35
Three-dimensional chromatin architecture datasets for aging and Alzheimer’s disease35
A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae)34
Mapping Road Surface Type of Kenya Using OpenStreetMap and High-resolution Google Satellite Imagery34
High resolution climate change observations and projections for the evaluation of heat-related extremes34
Underground well water level observation grid dataset from 2005 to 202234
NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images34
Observing the Central Arctic Atmosphere and Surface with University of Colorado uncrewed aircraft systems34
A benchmark GaoFen-7 dataset for building extraction from satellite images34
Variation of winter wheat phenology dataset in Huang Huai Hai Plain of China from 1981 to 202134
Molecular structural dataset of lignin macromolecule elucidating experimental structural compositions33
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy33
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations33
An Observation-Based Dataset of Global Sub-Daily Precipitation Indices (GSDR-I)33
A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland33
Dataset of the suitability of major food crops in Africa under climate change33
Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States33
Global inventory of species categorized by known underwater sonifery33
Cognitive tasks, anatomical MRI, and functional MRI data evaluating the construct of self-regulation33
A new high-resolution global topographic factor dataset calculated based on SRTM32
A soil database from Queretaro, Mexico for assessment of crop and irrigation water requirements32
A global dataset of terrestrial evapotranspiration and soil moisture dynamics from 1982 to 202032
Global hydro-environmental lake characteristics at high spatial resolution32
LungHist700: A dataset of histological images for deep learning in pulmonary pathology32
Analysis of metabolic dynamics during drought stress in Arabidopsis plants32
A global dataset on species occurrences and functional traits of Schizothoracinae fish32
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond32
A database with frailty, functional and inertial gait metrics for the research of fall causes in older adults32
Dataset of soil hydraulic parameters in the Yellow River Basin based on in situ deep sampling32
0.036423921585083