Scientific Data

Papers
(The TQCC of Scientific Data is 8. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas1351
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity600
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data569
Directional wave buoy data measured near Campbell Island, New Zealand449
RNA-seq of peripheral blood mononuclear cells of congenital generalized lipodystrophy type 2 patients360
Author Correction: Mobility networks in Greater Mexico City328
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece320
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers317
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations308
Linking Research Data with Physically Preserved Research Materials in Chemistry296
Empowering open data sharing for social good: a privacy-aware approach295
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 2020257
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data248
Chromosome-level genome assembly of the Rhizoctonia solani217
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models200
The R package for DICOM to brain imaging data structure conversion167
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)163
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells152
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction150
A global dataset of fossil fungi records from the Cenozoic144
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets142
A dataset of the daily edge of each polynya in the Antarctic132
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset128
Students’ performance dataset for using machine learning technique in physics education research125
Near-complete reference genome assembly of Hoya carnosa123
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings115
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change113
Generating FAIR research data in experimental tribology110
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour109
An open-access database of nature-based carbon offset project boundaries107
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland107
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa104
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–2023104
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions101
Machine learning-ready remote sensing data for Maya archaeology100
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes99
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)98
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports97
Author Correction: Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers97
Canopy height model and NAIP imagery pairs across CONUS97
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat96
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery95
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan93
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)91
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model91
A focus groups study on data sharing and research data management90
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica90
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study90
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology89
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems88
An 8-model ensemble of CMIP6-derived ocean surface wave climate88
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector88
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer88
PAVC: The foundation for a Pan-Arctic Vegetation Cover database86
Enrichment of lung cancer computed tomography collections with AI-derived annotations85
Identifying Cocoa Flower Visitors: A Deep Learning Dataset84
Chinese environmentally extended input-output database for 2017 and 201884
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes83
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images82
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE82
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots78
Chromosome-level genome assembly of rock carp (Procypris rabaudi)77
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing77
Bioclimatic atlas of the terrestrial Arctic76
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata76
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties76
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years75
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)73
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state72
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children72
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset72
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales72
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images70
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios69
Slovak database of speech affected by neurodegenerative diseases69
Statistical performance indicators and index—a new tool to measure country statistical capacity68
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers68
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa66
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research66
A database of steric and electronic properties of heteroaryl substituents65
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms65
Tsunami Runup Survey Data From The Taan Fjord Landslide Event65
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata65
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements64
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development64
An agenda for addressing bias in conflict data64
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components63
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data63
Full Field Digital Mammography Dataset from a Population Screening Program62
An Enhanced Phenology Dataset for Global Drylands from 2001 to 201962
Analysis of AlphaMissense data in different protein groups and structural context61
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems61
Dataset on heavy metal pollution assessment in freshwater ecosystems61
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper60
A Simulated Comprehensive Photon Flux Shielding Spectra Dataset for Advanced Radiation Safety Assessment60
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach60
A Global Database of Soil Plant Available Phosphorus60
Scaling up SoccerNet with multi-view spatial localization and re-identification59
Home monitoring with connected mobile devices for asthma attack prediction with machine learning59
China’s provincial process CO2 emissions from cement production during 1993–201959
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics59
An open dataset for oracle bone character recognition and decipherment59
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus58
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 202158
Shotgun metagenomes from productive lakes in an urban region of Sweden58
BUS-UCLM: Breast ultrasound lesion segmentation dataset57
A multilayered urban tree dataset of point clouds, quantitative structure and graph models57
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface56
The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents55
RailFOD23: A dataset for foreign object detection on railroad transmission lines55
A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland55
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors55
A dataset for deep learning based detection of printed circuit board surface defect54
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–202153
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection53
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios53
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows53
Global photovoltaic solar panel dataset from 2019 to 202251
Mobility of Erasmus+ students in Europe: Geolocated individual and aggregate mobility flows from 2014 to 202251
Multimodal Data for the Detection of Freezing of Gait in Parkinson’s Disease51
Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset51
Metagenomic sequencing and reconstruction of 82 microbial genomes from barley seed communities51
QMugs, quantum mechanical properties of drug-like molecules51
Constructing a global human epidemic database using open-source digital biosurveillance51
Chromosome-scale genome assembly and annotation of Xenocypris argentea51
Measurement of ship-generated waves in German coastal waterways from 1998–202251
A multi-site, multi-modal travelling-heads resource for brain MRI harmonisation50
Sentinel-3 Altimetry Thematic Products for Hydrology, Sea Ice and Land Ice50
Ontology for the Avida digital evolution platform50
OSMlanduse a dataset of European Union land use at 10 m resolution derived from OpenStreetMap and Sentinel-250
A Chinese Face Dataset with Dynamic Expressions and Diverse Ages Synthesized by Deep Learning50
Accumulation-depuration data collection in support of toxicokinetic modelling50
SignEEG v1.0: Multimodal Dataset with Electroencephalography and Hand-written Signature for Biometric Systems49
Dataset on child vaccination in Brazil from 1996 to 202149
A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa)49
Bias-corrected NESM3 global dataset for dynamical downscaling under 1.5 °C and 2 °C global warming scenarios49
A database of mapped global fishing activity 1950–201749
Haplotype-resolved chromosome-level genome assembly of Ehretia macrophylla49
A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations48
Combining citizen science data and literature to build a traits dataset of Taiwan’s birds48
High-resolution ethograms, accelerometer recordings, and behavioral time series of Japanese quail48
Spatial transcriptome profiling of normal human liver48
Gap-free 16-year (2005–2020) sub-diurnal surface meteorological observations across Florida48
A chromosome-scale reference genome of grasspea (Lathyrus sativus)48
Hong Kong Corpus of Chinese Sentence and Passage Reading47
MiTra: A Drone-Based Trajectory Data for an All-Traffic-State Inclusive Freeway with Ramps47
Bimodal electroencephalography-functional magnetic resonance imaging dataset for inner-speech recognition47
Daily precipitation dataset at 0.1° for the Yarlung Zangbo River basin from 2001 to 201547
The W2024 database of the water isotopologue $${{\rm{H}}}_{2}^{\,16}{\rm{O}}$$46
A database of in situ water temperatures for large inland lakes across the coterminous United States45
A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running45
1.5 million materials narratives generated by chatbots45
In vivo submillimeter diffusion MRI dataset of 9 macaque brains curated for tractography45
A curated dataset on the distribution of West Palaearctic freshwater bivalves45
Characterization of hormone-producing cell types in the teleost pituitary gland using single-cell RNA-seq44
Georectified polygon database of ground-mounted large-scale solar photovoltaic sites in the United States44
GriddingMachine, a database and software for Earth system modeling at global and regional scales44
An East Antarctic, sub-annual resolution water isotope record from the Mount Brown South Ice core43
An intra-annual 30-m dataset of small lakes of the Qilian Mountains for the period 1987–202043
A global dataset of terrestrial evapotranspiration and soil moisture dynamics from 1982 to 202043
Analysis-ready optical underwater images of Manganese-nodule covered seafloor of the Clarion-Clipperton Zone43
Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels43
Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 201743
Coral community data Heron Island Great Barrier Reef 1962–201643
Discrete typing units of Trypanosoma cruzi: Geographical and biological distribution in the Americas43
Global inventory of species categorized by known underwater sonifery43
Curated CYP450 Interaction Dataset: Covering the Majority of Phase I Drug Metabolism42
A benchmark for domain adaptation and generalization in smartphone-based human activity recognition42
Dataset of soil hydraulic parameters in the Yellow River Basin based on in situ deep sampling42
Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor42
A new high-resolution global topographic factor dataset calculated based on SRTM42
Analysis of metabolic dynamics during drought stress in Arabidopsis plants41
Inventory of shallow landslides triggered by extreme precipitation in July 2023 in Beijing, China41
Global hydro-environmental lake characteristics at high spatial resolution41
Mapping Road Surface Type of Kenya Using OpenStreetMap and High-resolution Google Satellite Imagery41
Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision41
Historical dataset details the distribution, extent and form of lost Ostrea edulis reef ecosystems41
ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-1941
Surrounding road density of child care centers in Australia41
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy41
A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland40
Global nature run data with realistic high-resolution carbon weather for the year of the Paris Agreement40
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method40
Author Correction: Geographical characterisation of British urban form and function using the spatial signatures framework40
Contextualized race and ethnicity annotations for clinical text from MIMIC-III40
Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research40
A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing40
An integrated multi-source dataset of elasmobranchs in the Red Sea following the Red Sea Decade Expedition40
Confocal imaging dataset to assess endothelial cell orientation during extreme glucose conditions40
PEARL-Neuro Database: EEG, fMRI, health and lifestyle data of middle-aged people at risk of dementia40
MSPB: a longitudinal multi-sensor dataset with phenotypic trait measurements from honey bees40
The HAInich: A multidisciplinary vision data-set for a better understanding of the forest ecosystem40
Publisher Correction: Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica40
Data scheme and data format for transferable force fields for molecular simulation39
ValLAI_Crop, a validation dataset for coarse-resolution satellite LAI products over Chinese cropland39
Innovative molecular networking analysis of steroids and characterisation of the urinary steroidome39
Harmonized Database of Western U.S. Water Rights (HarDWR) v.139
A speech corpus of Quechua Collao for automatic dimensional emotion recognition39
A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints39
A chromosome-level genome assembly of skipjack tuna, Katsuwonus pelamis (Perciformes: Scombridae)39
Detection of differential bait proteoforms through immunoprecipitation-mass spectrometry data analysis39
Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data39
A dataset of riverine nitrogen yield across watersheds in the Conterminous United States38
A panel sequencing dataset of peripheral blood gene variations in pan-cancer38
An RNA-seq time series of the medaka pituitary gland during sexual maturation38
A multi-year campus-level smart meter database38
Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy37
Chromosome-level genome assembly of Odontothrips loti Haliday (Thysanoptera: Thripidae)37
Sharkipedia: a curated open access database of shark and ray life history traits and abundance time-series37
A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models37
3D motion analysis dataset of healthy young adult volunteers walking and running on overground and treadmill37
A multimodal dataset for coronary microvascular disease biomarker discovery37
LungHist700: A dataset of histological images for deep learning in pulmonary pathology36
Cognitive tasks, anatomical MRI, and functional MRI data evaluating the construct of self-regulation36
A global dataset on species occurrences and functional traits of Schizothoracinae fish36
BIRAFFE2, a multimodal dataset for emotion-based personalization in rich affective game environments36
An EEG Dataset of Neural Signatures in a Competitive Two-Player Game Encouraging Deceptive Behavior36
Haplotype-resolved T2T genome assembly of the pear cultivar ‘Danxiahong’36
An Experimental and Clinical Physiological Signal Dataset for Automated Pain Recognition36
A dataset of eye gaze images for calibration-free eye tracking augmented reality headset36
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond36
Dataset of the suitability of major food crops in Africa under climate change36
A soil database from Queretaro, Mexico for assessment of crop and irrigation water requirements36
A two-year dataset of energy, environment, and system operations for an ultra-low energy office building36
A database with frailty, functional and inertial gait metrics for the research of fall causes in older adults36
A biologging database of juvenile white sharks from the northeast Pacific35
Multiorder hydrologic Position for Europe — a Set of Features for Machine Learning and Analysis in Hydrology35
High-Resolution Ultrasound Data for AI-Based Segmentation in Mouse Brain Tumor35
Big data collection in pharmaceutical manufacturing and its use for product quality predictions35
A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae)35
ROBIN: Reference observatory of basins for international hydrological climate change detection34
Chromosome-level genome assembly of the sap beetle Glischrochilus (Librodor) japonius (Coleoptera: Nitidulidae)34
A corpus and a modular infrastructure for the empirical study of (an)notated music34
The draft genome sequences of the cosmopolitan centric diatom, the genus Skeletonema34
The genome assembly and annotation of the cricket Gryllus longicercus34
AVDOS-VR: Affective Video Database with Physiological Signals and Continuous Ratings Collected Remotely in VR34
Whole genome sequencing and structural variations provide insights into the body size traits of Hu sheep34
A point-of-use drinking water quality dataset from fieldwork in Detroit, Michigan34
A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset34
24-hour average PM2.5 concentration caused by aircraft in Chinese airports from Jan. 2006 to Dec. 202334
RecyBat24: a dataset for detecting lithium-ion batteries in electronic waste disposal34
Renji endoscopic submucosal dissection video data set for colorectal neoplastic lesions34
A global dataset on mungbean for managing seed yield and quality34
Comprehensive curation and validation of genomic datasets for chestnut34
Acting Emotions: a comprehensive dataset of elicited emotions33
Brightfield vs Fluorescent Staining Dataset–A Test Bed Image Set for Machine Learning based Virtual Staining33
Variation of winter wheat phenology dataset in Huang Huai Hai Plain of China from 1981 to 202133
Psilocybin’s acute and persistent brain effects: a precision imaging drug trial33
An ageing study of twenty 18650 lithium-ion Graphite/LFP cells in first and second life use33
STInt: An integrated dataset covering science, technology and industry information in the pharmaceutical field33
0.031022071838379