Scientific Data

Papers
(The TQCC of Scientific Data is 10. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas2310
Tsunami Runup Survey Data From The Taan Fjord Landslide Event878
Linking Research Data with Physically Preserved Research Materials in Chemistry841
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data773
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE608
Development of Gridded Root-Zone Soil Moisture Product for India, 1981–2024556
A long-term ecosystem monitoring dataset from the ICP Integrated Monitoring network: biogeochemical data from 1977–2020 across 14 European countries535
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach437
Dataset on heavy metal pollution assessment in freshwater ecosystems425
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland384
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology380
Empowering open data sharing for social good: a privacy-aware approach348
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems301
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 2021301
A validated Mandarin Chinese Auditory Emotion Database of Subject-Personal-Pronoun Sentences (MCAE-SPPS)299
Chromosome-level genome assembly of the alpine extremophyte Tibetan snow lotus, Saussurea hypsipeta Diels278
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics209
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus209
Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond198
Multiclass Dataset for Intelligent Detection of Wind Turbine Blade Defects Using Drone Imagery187
An 8-model ensemble of CMIP6-derived ocean surface wave climate176
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface174
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa173
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components159
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research158
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions155
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children154
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector151
Near-complete reference genome assembly of Hoya carnosa151
Enrichment of lung cancer computed tomography collections with AI-derived annotations150
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa145
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data144
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change143
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset142
Author Correction: Mobility networks in Greater Mexico City133
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece132
Chromosome-level genome assembly of rock carp (Procypris rabaudi)130
Author Correction: Database covering the prayer movements which were not available previously130
Author Correction: GERDA: The German Election Database130
A chromosome-scale assembly of Ormosia boluoensis (Fabaceae)128
Chromosome-level genome assembly of the Rhizoctonia solani128
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan127
A curated dataset of great ape genome diversity127
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–2021125
Slovak database of speech affected by neurodegenerative diseases120
RailFOD23: A dataset for foreign object detection on railroad transmission lines119
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes116
BUS-UCLM: Breast ultrasound lesion segmentation dataset113
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows112
An open dataset for oracle bone character recognition and decipherment111
Multimodal Data for the Detection of Freezing of Gait in Parkinson’s Disease111
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets110
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery109
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images107
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios103
Bounding the costs of electric vehicle managed charging—supply curves for scenarios from 2025 to 2050102
A bimodal dataset for diabetes research102
Full Field Digital Mammography Dataset from a Population Screening Program101
Chromosomal-level genome assembly of Ichthyurus bourgeoisi Gestro using PacBio HiFi and Hi-C sequencing101
Machine learning-ready remote sensing data for Maya archaeology98
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)97
A dataset for deep learning based detection of printed circuit board surface defect96
A Global Database of Soil Plant Available Phosphorus96
Home monitoring with connected mobile devices for asthma attack prediction with machine learning94
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction93
QMugs, quantum mechanical properties of drug-like molecules92
A focus groups study on data sharing and research data management90
Dataset for studying deformation in 3D patient-specific pulmonary artery anatomies90
Scaling up SoccerNet with multi-view spatial localization and re-identification90
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations89
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells88
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection88
Analysis of AlphaMissense data in different protein groups and structural context87
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat86
A longitudinal cross-country dataset on agricultural productivity and welfare in Sub-Saharan Africa86
Correction: Sea lice infestation dataset for wild and farmed salmon populations on the Pacific coast of Canada (2001–2023)85
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata85
A comprehensive dataset of riverine levee overtopping events for advancing risk assessment85
Shotgun metagenomes from productive lakes in an urban region of Sweden84
A near-global dataset of dissolved organic carbon concentrations and yields in forested headwater streams84
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)83
A dataset of the daily edge of each polynya in the Antarctic83
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour82
A near-telomere-to-telomere genome assembly of the Chinese soft-shelled turtle (Pelodiscus sinensis)82
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales82
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes81
A dataset of real-world oscillograms from electrical power grids80
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata80
A multimodal dataset of causal mechanisms in materials science literature79
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development78
A large-scale dataset of pre- and postsurgical MRI data from patients with chronic trigeminal neuralgia78
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)77
Paired magnetic susceptibility and geochemistry of young volcanism in Iceland and Tengchong, China77
An Enhanced Phenology Dataset for Global Drylands from 2001 to 201977
The Latin American Legislators Dataset74
A global dataset of fossil fungi records from the Cenozoic74
One-year high-frequency environmental and behavioral data from ALAN experience in a French coastal area73
M3OT: A Multi-Drone Multi-Modality dataset for Multi-Object Tracking72
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data72
Spatio-temporal dataset (2009–2012) of Culicoides spp., vectors of livestock viruses, in France72
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state72
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers71
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings71
Atmospheric and oceanic data from a triangle-shaped moored array in the northern South China Sea during 201669
A Simulated Comprehensive Photon Flux Shielding Spectra Dataset for Advanced Radiation Safety Assessment69
An agenda for addressing bias in conflict data68
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica68
An open-access database of nature-based carbon offset project boundaries68
A Fine-Grained Lightweight Urban Signalized-Intersection Dataset of Dense Conflict Trajectories67
Bioclimatic atlas of the terrestrial Arctic66
Statistical performance indicators and index—a new tool to measure country statistical capacity66
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements66
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors66
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset66
A multilayered urban tree dataset of point clouds, quantitative structure and graph models66
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems65
Reconstructing high-quality ground-level ozone records from 1980 to 2012 in central and eastern China64
A VibV Dataset Integrating Vibration and Vision for Enhanced Safety in Self-Driving Tasks64
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model64
Near complete T2T genome assembly of the banded goonch (Bagarius rutilus)64
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images62
Identifying Cocoa Flower Visitors: A Deep Learning Dataset62
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models62
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots62
Generating FAIR research data in experimental tribology62
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports61
The R package for DICOM to brain imaging data structure conversion61
Canopy height model and NAIP imagery pairs across CONUS61
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties61
A database of steric and electronic properties of heteroaryl substituents61
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing61
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers60
Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping60
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)60
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms60
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 202060
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–202360
Students’ performance dataset for using machine learning technique in physics education research60
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer59
VISDB 2.0: A manually curated resource of viral integration sites and their regulatory maps in human diseases59
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper59
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years59
Measurement of ship-generated waves in German coastal waterways from 1998–202258
A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints58
The HAInich: A multidisciplinary vision data-set for a better understanding of the forest ecosystem58
PAVC: The foundation for a Pan-Arctic Vegetation Cover database58
A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset57
Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset57
Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels57
A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae)57
A corpus and a modular infrastructure for the empirical study of (an)notated music57
Mapping Road Surface Type of Kenya Using OpenStreetMap and High-resolution Google Satellite Imagery57
Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research57
The draft genome sequences of the cosmopolitan centric diatom, the genus Skeletonema57
A speech corpus of Quechua Collao for automatic dimensional emotion recognition57
Innovative molecular networking analysis of steroids and characterisation of the urinary steroidome57
3D motion analysis dataset of healthy young adult volunteers walking and running on overground and treadmill56
Improved high quality sand fly assemblies enabled by ultra low input long read sequencing56
Coral community data Heron Island Great Barrier Reef 1962–201656
Curated CYP450 Interaction Dataset: Covering the Majority of Phase I Drug Metabolism56
Comprehensive curation and validation of genomic datasets for chestnut56
Proteomic Dataset of Sparganum proliferum and Spirometra mansoni to Understand Asexual Proliferation in Hosts56
Observing the Central Arctic Atmosphere and Surface with University of Colorado uncrewed aircraft systems56
8,266 SARS-CoV-2 Genomic Assemblies from Asymptomatic Carriers in Japan55
Chromosome-scale genome assembly and annotation of Xenocypris argentea55
Comprehensive UAV and ground data for typical semiarid sites in the midstream of the Heihe River Basin55
A database of in situ water temperatures for large inland lakes across the coterminous United States55
A global dataset for steel aluminum and cement in-use stocks at 500 m gridded level 2000-201955
A global dataset of terrestrial evapotranspiration and soil moisture dynamics from 1982 to 202054
A Labrador PeptideAtlas and DIA spectral assay library - resources for proteomics research in dogs54
Global photovoltaic solar panel dataset from 2019 to 202253
RecyBat24: a dataset for detecting lithium-ion batteries in electronic waste disposal53
A Multi-Omics Dataset of Prostate Cancer Response to Oncolytic Virus OH2 Treatment53
MACH: A Multi-Attribute Catchment Hydrometeorological dataset53
A whole rock geochemical dataset for magmatic rocks drilled on the mid-Norwegian margin53
A Chinese Face Dataset with Dynamic Expressions and Diverse Ages Synthesized by Deep Learning53
PEARL-Neuro Database: EEG, fMRI, health and lifestyle data of middle-aged people at risk of dementia52
ReaLSAT, a global dataset of reservoir and lake surface area variations52
A large-scale, multitask, multisensory dataset for climate-aware crop monitoring in the US from 2018–202252
A database with frailty, functional and inertial gait metrics for the research of fall causes in older adults52
Chromosome-level genome assembly of the sap beetle Glischrochilus (Librodor) japonius (Coleoptera: Nitidulidae)52
A chromosome-scale reference genome of grasspea (Lathyrus sativus)51
A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa)51
De novo transcriptome analysis of Perna perna L. (Bivalve) with functional and metabolic pathway analysis51
Global hydro-environmental lake characteristics at high spatial resolution51
Single-nucleus RNA sequencing dataset of diverse tissues from wild-type monkey and Tau-P301L transgenic monkey51
Muscle Proteomic Dataset of A Threatened Indian walking catfish, Clarias magur (Hamilton 1822) Exposed to Thermal Stress51
The dataset for extending EMNIST evaluation51
Detection of differential bait proteoforms through immunoprecipitation-mass spectrometry data analysis50
Harmonized Database of Western U.S. Water Rights (HarDWR) v.150
A panel sequencing dataset of peripheral blood gene variations in pan-cancer50
A chromosome-level genome assembly of skipjack tuna, Katsuwonus pelamis (Perciformes: Scombridae)50
Data scheme and data format for transferable force fields for molecular simulation50
Whole genome sequencing and structural variations provide insights into the body size traits of Hu sheep49
Renji endoscopic submucosal dissection video data set for colorectal neoplastic lesions49
A compendium of temperature and salinity profiles and discrete nutrients from selected NOAA programs in Alaska49
Brightfield vs Fluorescent Staining Dataset–A Test Bed Image Set for Machine Learning based Virtual Staining48
GriddingMachine, a database and software for Earth system modeling at global and regional scales48
A point-of-use drinking water quality dataset from fieldwork in Detroit, Michigan48
Daily station-level records of air temperature, snow depth, and ground temperature in the Northern Hemisphere48
Impact factors for quantifying country-level terrestrial biodiversity intactness footprints (IBIF)47
Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data47
High-resolution ethograms, accelerometer recordings, and behavioral time series of Japanese quail47
Chromosome-level genome assembly of the Tyrrhenian tree frog (Hyla sarda)47
A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations46
Bias-corrected NESM3 global dataset for dynamical downscaling under 1.5 °C and 2 °C global warming scenarios46
Dataset on child vaccination in Brazil from 1996 to 202146
A haplotype-resolved genome assembly of Anoectochilus roxburghii46
An East Antarctic, sub-annual resolution water isotope record from the Mount Brown South Ice core46
Genome Skimming Reveals Genetic Diversity in 220 Papaver Individuals from China46
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond46
A multi-model based dataset of global atmospheric moisture source-sink relationships and atmospheric basins46
AHAD: African major crops harvested area dataset for the years of 2000, 2010, and 202046
Multiorder hydrologic Position for Europe — a Set of Features for Machine Learning and Analysis in Hydrology45
Daily precipitation dataset at 0.1° for the Yarlung Zangbo River basin from 2001 to 201545
Author Correction: Geographical characterisation of British urban form and function using the spatial signatures framework45
Mobility of Erasmus+ students in Europe: Geolocated individual and aggregate mobility flows from 2014 to 202245
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method45
Contextualized race and ethnicity annotations for clinical text from MIMIC-III45
In vivo submillimeter diffusion MRI dataset of 9 macaque brains curated for tractography45
A dataset of riverine nitrogen yield across watersheds in the Conterminous United States45
Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 201744
1.5 million materials narratives generated by chatbots44
Georectified polygon database of ground-mounted large-scale solar photovoltaic sites in the United States44
Haplotype-resolved chromosome-level genome assembly of Ehretia macrophylla44
Hong Kong Corpus of Chinese Sentence and Passage Reading44
A soil database from Queretaro, Mexico for assessment of crop and irrigation water requirements44
A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models44
Publisher Correction: Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica44
An Observation-Based Dataset of Global Sub-Daily Precipitation Indices (GSDR-I)44
A multimodal dataset for coronary microvascular disease biomarker discovery44
Kymata Soto Language Dataset: an electro-magnetoencephalographic dataset for natural speech processing44
BIRAFFE2, a multimodal dataset for emotion-based personalization in rich affective game environments43
An intra-annual 30-m dataset of small lakes of the Qilian Mountains for the period 1987–202043
Underground well water level observation grid dataset from 2005 to 202243
MCV-Intention: A Multimodalities and Cross-View Dataset for Human Assembly Intention Recognition43
An Experimental and Clinical Physiological Signal Dataset for Automated Pain Recognition43
Analysis-ready optical underwater images of Manganese-nodule covered seafloor of the Clarion-Clipperton Zone43
A benchmark database of ten years of prospective next-day earthquake forecasts in California from the Collaboratory for the Study of Earthquake Predictability43
A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running43
A global dataset on species occurrences and functional traits of Schizothoracinae fish43
High temporal and spatial resolution projected electricity carbon emission factors of China from 2025–206042
Ontology for the Avida digital evolution platform42
Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy42
PTB-XL+, a comprehensive electrocardiographic feature dataset42
High resolution climate change observations and projections for the evaluation of heat-related extremes42
Discrete typing units of Trypanosoma cruzi: Geographical and biological distribution in the Americas42
A new high-resolution global topographic factor dataset calculated based on SRTM42
STInt: An integrated dataset covering science, technology and industry information in the pharmaceutical field42
A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing42
0.077476978302002