Scientific Data

Papers
(The TQCC of Scientific Data is 9. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-12-01 to 2025-12-01.)
ArticleCitations
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece1688
Author Correction: Mobility networks in Greater Mexico City681
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas675
Identifying Cocoa Flower Visitors: A Deep Learning Dataset595
Tsunami Runup Survey Data From The Taan Fjord Landslide Event420
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)413
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells406
Linking Research Data with Physically Preserved Research Materials in Chemistry401
Chromosome-level genome assembly of the Rhizoctonia solani362
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa362
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction294
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data293
An Enhanced Phenology Dataset for Global Drylands from 2001 to 2019244
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development232
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE227
A dataset of the daily edge of each polynya in the Antarctic223
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 2020189
A focus groups study on data sharing and research data management180
The Latin American Legislators Dataset175
PAVC: The foundation for a Pan-Arctic Vegetation Cover database168
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors157
A global dataset of fossil fungi records from the Cenozoic146
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus142
A database of steric and electronic properties of heteroaryl substituents142
An 8-model ensemble of CMIP6-derived ocean surface wave climate141
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)139
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years126
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children122
Near-complete reference genome assembly of Hoya carnosa121
A Simulated Comprehensive Photon Flux Shielding Spectra Dataset for Advanced Radiation Safety Assessment118
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector118
Empowering open data sharing for social good: a privacy-aware approach118
Enrichment of lung cancer computed tomography collections with AI-derived annotations117
Chromosome-level genome assembly of rock carp (Procypris rabaudi)117
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa116
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data116
A chromosome-scale assembly of Ormosia boluoensis (Fabaceae)112
Author Correction: Database covering the prayer movements which were not available previously111
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change110
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland109
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)109
Students’ performance dataset for using machine learning technique in physics education research107
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes106
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat105
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica102
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots100
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images99
An open-access database of nature-based carbon offset project boundaries99
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour98
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions98
Statistical performance indicators and index—a new tool to measure country statistical capacity98
A longitudinal cross-country dataset on agricultural productivity and welfare in Sub-Saharan Africa97
Machine learning-ready remote sensing data for Maya archaeology95
Home monitoring with connected mobile devices for asthma attack prediction with machine learning95
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models95
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer91
Slovak database of speech affected by neurodegenerative diseases90
Canopy height model and NAIP imagery pairs across CONUS87
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata85
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports85
The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents84
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model83
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–202183
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties81
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state80
Shotgun metagenomes from productive lakes in an urban region of Sweden80
The R package for DICOM to brain imaging data structure conversion79
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity79
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets79
Author Correction: GERDA: The German Election Database79
Bioclimatic atlas of the terrestrial Arctic79
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology78
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations78
Generating FAIR research data in experimental tribology77
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data76
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images75
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings74
A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland74
Spatio-temporal dataset (2009–2012) of Culicoides spp., vectors of livestock viruses, in France74
Dataset on heavy metal pollution assessment in freshwater ecosystems73
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows73
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata73
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales73
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics72
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems72
A multilayered urban tree dataset of point clouds, quantitative structure and graph models71
One-year high-frequency environmental and behavioral data from ALAN experience in a French coastal area70
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers70
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection68
QMugs, quantum mechanical properties of drug-like molecules68
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–202368
A Global Database of Soil Plant Available Phosphorus68
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers68
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing68
A curated dataset of great ape genome diversity67
Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond67
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components67
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery66
RailFOD23: A dataset for foreign object detection on railroad transmission lines66
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes66
An agenda for addressing bias in conflict data65
China’s provincial process CO2 emissions from cement production during 1993–201965
Full Field Digital Mammography Dataset from a Population Screening Program64
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset63
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms63
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface62
Multimodal Data for the Detection of Freezing of Gait in Parkinson’s Disease62
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach61
An open dataset for oracle bone character recognition and decipherment59
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan59
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research59
Scaling up SoccerNet with multi-view spatial localization and re-identification59
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset58
BUS-UCLM: Breast ultrasound lesion segmentation dataset58
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 202157
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements56
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)56
A dataset for deep learning based detection of printed circuit board surface defect55
Analysis of AlphaMissense data in different protein groups and structural context55
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios55
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper55
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems55
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios54
Author Correction: Geographical characterisation of British urban form and function using the spatial signatures framework54
Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research54
Confocal imaging dataset to assess endothelial cell orientation during extreme glucose conditions54
Constructing a global human epidemic database using open-source digital biosurveillance54
A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints54
24-hour average PM2.5 concentration caused by aircraft in Chinese airports from Jan. 2006 to Dec. 202354
The HAInich: A multidisciplinary vision data-set for a better understanding of the forest ecosystem54
Measurement of ship-generated waves in German coastal waterways from 1998–202254
STInt: An integrated dataset covering science, technology and industry information in the pharmaceutical field54
A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset53
Comprehensive curation and validation of genomic datasets for chestnut53
Chromosome-level genome assembly of the sap beetle Glischrochilus (Librodor) japonius (Coleoptera: Nitidulidae)53
RecyBat24: a dataset for detecting lithium-ion batteries in electronic waste disposal53
The draft genome sequences of the cosmopolitan centric diatom, the genus Skeletonema52
A corpus and a modular infrastructure for the empirical study of (an)notated music52
A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae)52
Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset52
Variation of winter wheat phenology dataset in Huang Huai Hai Plain of China from 1981 to 202152
Innovative molecular networking analysis of steroids and characterisation of the urinary steroidome51
A database of in situ water temperatures for large inland lakes across the coterminous United States51
Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels51
A curated dataset on the distribution of West Palaearctic freshwater bivalves51
Coral community data Heron Island Great Barrier Reef 1962–201651
Haplotype-resolved chromosome-level genome assembly of Ehretia macrophylla50
A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations50
A chromosome-scale reference genome of grasspea (Lathyrus sativus)50
Chromosome-scale genome assembly and annotation of Xenocypris argentea50
A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running50
ROBIN: Reference observatory of basins for international hydrological climate change detection49
Observing the Central Arctic Atmosphere and Surface with University of Colorado uncrewed aircraft systems49
Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States49
Hong Kong Corpus of Chinese Sentence and Passage Reading49
Three-dimensional chromatin architecture datasets for aging and Alzheimer’s disease49
Publisher Correction: Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica49
A dataset of eye gaze images for calibration-free eye tracking augmented reality headset49
A global dataset on mungbean for managing seed yield and quality48
Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 201748
Improved high quality sand fly assemblies enabled by ultra low input long read sequencing48
PEARL-Neuro Database: EEG, fMRI, health and lifestyle data of middle-aged people at risk of dementia48
COFACTOR Drammen dataset - 4 years of hourly energy use data from 45 public buildings in Drammen, Norway48
A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models48
Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy47
A database of mapped global fishing activity 1950–201747
DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks47
High-resolution ethograms, accelerometer recordings, and behavioral time series of Japanese quail47
Contextualized race and ethnicity annotations for clinical text from MIMIC-III47
Global nature run data with realistic high-resolution carbon weather for the year of the Paris Agreement47
LungHist700: A dataset of histological images for deep learning in pulmonary pathology47
The genome assembly and annotation of the cricket Gryllus longicercus47
Surrounding road density of child care centers in Australia47
Gap-free 16-year (2005–2020) sub-diurnal surface meteorological observations across Florida47
Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data46
Chromosome-level genome assembly of the Tyrrhenian tree frog (Hyla sarda)46
A global dataset for steel aluminum and cement in-use stocks at 500 m gridded level 2000-201946
Non-coding RNA profiling in BRAFV600E-mutant cutaneous melanoma before and after Spry1 depletion46
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy46
A panel sequencing dataset of peripheral blood gene variations in pan-cancer46
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method46
A dataset of riverine nitrogen yield across watersheds in the Conterminous United States45
OSMlanduse a dataset of European Union land use at 10 m resolution derived from OpenStreetMap and Sentinel-245
Harmonized Database of Western U.S. Water Rights (HarDWR) v.145
A chromosome-level genome assembly of skipjack tuna, Katsuwonus pelamis (Perciformes: Scombridae)45
Detection of differential bait proteoforms through immunoprecipitation-mass spectrometry data analysis45
A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing45
Renji endoscopic submucosal dissection video data set for colorectal neoplastic lesions44
Haplotype-resolved T2T genome assembly of the pear cultivar ‘Danxiahong’44
A biologging database of juvenile white sharks from the northeast Pacific44
Distribution of soil macrofauna across different habitats in the Eastern European Alps44
Data scheme and data format for transferable force fields for molecular simulation44
Whole genome sequencing and structural variations provide insights into the body size traits of Hu sheep44
Historical dataset details the distribution, extent and form of lost Ostrea edulis reef ecosystems44
Soil carbon stock densities in mangrove and forested wetland ecosystems of Panama43
Curated CYP450 Interaction Dataset: Covering the Majority of Phase I Drug Metabolism43
Multiorder hydrologic Position for Europe — a Set of Features for Machine Learning and Analysis in Hydrology43
A benchmark database of ten years of prospective next-day earthquake forecasts in California from the Collaboratory for the Study of Earthquake Predictability43
Transcriptome profiling of mRNA and lncRNA involved in wax biosynthesis in cauliflower43
3D motion analysis dataset of healthy young adult volunteers walking and running on overground and treadmill42
AHAD: African major crops harvested area dataset for the years of 2000, 2010, and 202042
Metagenomic sequencing and reconstruction of 82 microbial genomes from barley seed communities42
In vivo submillimeter diffusion MRI dataset of 9 macaque brains curated for tractography42
Genome Skimming Reveals Genetic Diversity in 220 Papaver Individuals from China42
A soil database from Queretaro, Mexico for assessment of crop and irrigation water requirements42
An Experimental and Clinical Physiological Signal Dataset for Automated Pain Recognition41
Mobility of Erasmus+ students in Europe: Geolocated individual and aggregate mobility flows from 2014 to 202241
De novo transcriptome analysis of Perna perna L. (Bivalve) with functional and metabolic pathway analysis41
Psilocybin’s acute and persistent brain effects: a precision imaging drug trial41
A new high-resolution global topographic factor dataset calculated based on SRTM41
A database with frailty, functional and inertial gait metrics for the research of fall causes in older adults40
The RESILIENT Dataset: Multimodal Monitoring of Ageing-Related Comorbidities and Cognitive Decline40
A Chinese Face Dataset with Dynamic Expressions and Diverse Ages Synthesized by Deep Learning40
GriddingMachine, a database and software for Earth system modeling at global and regional scales40
A global dataset on species occurrences and functional traits of Schizothoracinae fish39
An EEG Dataset of Neural Signatures in a Competitive Two-Player Game Encouraging Deceptive Behavior39
MCV-Intention: A Multimodalities and Cross-View Dataset for Human Assembly Intention Recognition39
A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa)39
Brightfield vs Fluorescent Staining Dataset–A Test Bed Image Set for Machine Learning based Virtual Staining39
3DSC - a dataset of superconductors including crystal structures39
Acting Emotions: a comprehensive dataset of elicited emotions39
Global hydro-environmental lake characteristics at high spatial resolution39
1.5 million materials narratives generated by chatbots39
Impact factors for quantifying country-level terrestrial biodiversity intactness footprints (IBIF)39
An integrated multi-source dataset of elasmobranchs in the Red Sea following the Red Sea Decade Expedition39
A multimodal dataset for coronary microvascular disease biomarker discovery39
High resolution climate change observations and projections for the evaluation of heat-related extremes38
Inventory of shallow landslides triggered by extreme precipitation in July 2023 in Beijing, China38
BIRAFFE2, a multimodal dataset for emotion-based personalization in rich affective game environments38
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond38
Dataset on child vaccination in Brazil from 1996 to 202138
Ontology for the Avida digital evolution platform38
High-Resolution Ultrasound Data for AI-Based Segmentation in Mouse Brain Tumor38
Global Bias-Corrected CORDEX Datasets at Half Degree Resolution38
SignEEG v1.0: Multimodal Dataset with Electroencephalography and Hand-written Signature for Biometric Systems38
Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor38
A multi-site, multi-modal travelling-heads resource for brain MRI harmonisation37
MSPB: a longitudinal multi-sensor dataset with phenotypic trait measurements from honey bees37
An ageing study of twenty 18650 lithium-ion Graphite/LFP cells in first and second life use37
Underground well water level observation grid dataset from 2005 to 202237
Cognitive tasks, anatomical MRI, and functional MRI data evaluating the construct of self-regulation36
An intra-annual 30-m dataset of small lakes of the Qilian Mountains for the period 1987–202036
Sentinel-3 Altimetry Thematic Products for Hydrology, Sea Ice and Land Ice36
Accumulation-depuration data collection in support of toxicokinetic modelling36
A multi-model based dataset of global atmospheric moisture source-sink relationships and atmospheric basins36
Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision36
A Multi-Omics Dataset of Prostate Cancer Response to Oncolytic Virus OH2 Treatment36
Combining citizen science data and literature to build a traits dataset of Taiwan’s birds36
Sharkipedia: a curated open access database of shark and ray life history traits and abundance time-series36
An East Antarctic, sub-annual resolution water isotope record from the Mount Brown South Ice core36
2.7902598381042