Scientific Data

Papers
(The TQCC of Scientific Data is 9. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-05-01 to 2024-05-01.)
ArticleCitations
The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data625
County-level CO2 emissions and sequestration in China during 1997–2017445
PTB-XL, a large publicly available electrocardiography dataset359
A cross-country database of COVID-19 testing336
A harmonized global nighttime light dataset 1992–2018242
MIMIC-IV, a freely accessible electronic health record dataset240
Dynamic World, Near real-time global 10 m land use land cover mapping238
HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy211
Materials Cloud, a platform for open computational science191
COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown190
The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity189
Holocene global mean surface temperature, a multi-method reconstruction approach188
Bias-corrected climate projections for South Asia from Coupled Model Intercomparison Project-6178
A patient-centric dataset of images and metadata for identifying melanomas using clinical context172
InvaCost, a public database of the economic costs of biological invasions worldwide166
Version 3 of the Global Aridity Index and Potential Evapotranspiration Database161
The TRUST Principles for digital repositories160
Highly accurate long-read HiFi sequencing data for five complex genomes155
The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms150
Outlining where humans live, the World Settlement Footprint 2015145
Systematic phenotyping and characterization of the 5xFAD mouse model of Alzheimer’s disease144
The 10-m crop type maps in Northeast China during 2017–2019144
AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance142
The human O-GlcNAcome database and meta-analysis140
The International Bathymetric Chart of the Arctic Ocean Version 4.0139
Multiscale dynamic human mobility flow dataset in the U.S. during the COVID-19 epidemic137
A structured open dataset of government interventions in response to COVID-19124
A global-scale data set of mining areas119
NASA Global Daily Downscaled Projections, CMIP6118
Carbon Monitor, a near-real-time daily dataset of global CO2 emission from fossil fuel and cement production114
Data sharing practices and data availability upon request differ across scientific disciplines109
Operationalizing the CARE and FAIR Principles for Indigenous data futures108
COVID-CT-MD, COVID-19 computed tomography scan dataset applicable in machine learning and deep learning107
The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules105
MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification101
Global 1 km × 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data99
A global record of annual terrestrial Human Footprint dataset from 2000 to 201898
Global land use for 2015–2100 at 0.05° resolution under diverse socioeconomic and climate scenarios95
Building a PubMed knowledge graph94
Kvasir-Capsule, a video capsule endoscopy dataset90
High-throughput screening platform for solid electrolytes combining hierarchical ion-transport prediction algorithms90
The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition89
Gridded daily weather data for North America with comprehensive uncertainty quantification88
VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations84
A global map of terrestrial habitat types83
Bias-corrected CMIP6 global dataset for dynamical downscaling of the historical and future climate (1979–2100)83
A high-resolution in vivo magnetic resonance imaging atlas of the human hypothalamic region82
A database of battery materials auto-generated using ChemDataExtractor82
GlobalFungi, a global database of fungal occurrences from high-throughput-sequencing metabarcoding studies81
Introducing the FAIR Principles for research software77
ERA5-based global meteorological wildfire danger maps75
K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations74
COVID-19 Disease Map, building a computational repository of SARS-CoV-2 virus-host interaction mechanisms74
High-resolution monthly precipitation and temperature time series from 2006 to 210074
Geomorpho90m, empirical evaluation and accuracy assessment of global high-resolution geomorphometric layers73
Global quantitative analysis of the human brain proteome and phosphoproteome in Alzheimer’s disease73
Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry73
AusTraits, a curated plant trait database for the Australian flora72
PERSIANN-CCS-CDR, a 3-hourly 0.04° global precipitation climate data record for heavy precipitation studies72
COVIDiSTRESS Global Survey dataset on psychological and behavioural consequences of the COVID-19 outbreak71
Global terrestrial carbon fluxes of 1999–2019 estimated by upscaling eddy covariance data with a random forest71
Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions69
CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis69
Chinese provincial multi-regional input-output database for 2012, 2015, and 201768
The Amsterdam Open MRI Collection, a set of multimodal MRI datasets for individual difference analyses68
Thermodynamic and transport properties of hydrogen containing streams68
DISPERSE, a trait database to assess the dispersal potential of European aquatic macroinvertebrates67
HIT-COVID, a global database tracking public health interventions to COVID-1966
Building a knowledge graph to enable precision medicine65
Combining expert and crowd-sourced training data to map urban form and functions for the continental US64
Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework64
A SARS-CoV-2 cytopathicity dataset generated by high-content screening of a large drug repurposing collection63
A database of human gait performance on irregular and uneven surfaces collected by wearable sensors63
An automatic multi-tissue human fetal brain segmentation benchmark using the Fetal Tissue Annotation Dataset63
Hourly potential evapotranspiration at 0.1° resolution for the global land surface from 1981-present62
National contributions to climate change due to historical emissions of carbon dioxide, methane, and nitrous oxide since 185061
Systematic analysis of infectious disease outcomes by age shows lowest severity in school-age children60
A synthesis of bacterial and archaeal phenotypic trait data60
Coastal sea level anomalies and associated trends from Jason satellite altimetry over 2002–201858
Global high-resolution emissions of soil NOx, sea salt aerosols, and biogenic volatile organic compounds58
GlobSnow v3.0 Northern Hemisphere snow water equivalent dataset57
Expanded dataset of mechanical properties and observed phases of multi-principal element alloys57
Mapping twenty years of corn and soybean across the US Midwest using the Landsat archive57
Gridded fossil CO2 emissions and related O2 combustion consistent with national inventories 1959–201856
A naturalistic neuroimaging database for understanding the brain using ecological stimuli56
ClimateEU, scale-free climate normals, historical time series, and future projections for Europe56
Downscaling GRACE total water storage change using partial least squares regression55
CT-ORG, a new dataset for multiple organ segmentation in computed tomography55
GEOM, energy-annotated molecular conformations for property prediction and molecular generation55
Publisher Correction: Present and future Köppen-Geiger climate classification maps at 1-km resolution54
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios54
Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules53
Multivariate time series dataset for space weather data analytics53
COVID-19 pandemic reveals the peril of ignoring metadata standards53
Global soil moisture data derived through machine learning trained with in-situ measurements52
Author Correction: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data52
Global daily 1 km land surface precipitation based on cloud cover-informed downscaling51
High-resolution terrestrial climate, bioclimate and vegetation for the last 120,000 years51
Local sea level trends, accelerations and uncertainties over 1993–201950
Air pollution emissions from Chinese power plants based on the continuous emission monitoring systems network50
Comprehensive dataset of shotgun metagenomes from oxygen stratified freshwater lakes and ponds50
FIVES: A Fundus Image Dataset for Artificial Intelligence based Vessel Segmentation50
Harmonised LUCAS in-situ land cover and use database for field surveys from 2006 to 2018 in the European Union50
OPTIMADE, an API for exchanging materials data50
A global dataset of surface water and groundwater salinity measurements from 1980–201949
LCVP, The Leipzig catalogue of vascular plants, a new taxonomic reference list for all known vascular plants49
Mapping of 30-meter resolution tile-drained croplands using a geospatial modeling approach49
A multi-site, multi-disorder resting-state magnetic resonance image database49
An open tool for creating battery-electric vehicle time series from empirical data, emobpy48
Electrochemical metrics for corrosion resistant alloys47
QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules47
A dataset of clinically recorded radar vital signs with synchronised reference sensor signals46
Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development46
A new comprehensive trait database of European and Maghreb butterflies, Papilionoidea45
A long term global daily soil moisture dataset derived from AMSR-E and AMSR2 (2002–2019)45
BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation45
Lower-limb kinematics and kinetics during continuously varying human locomotion45
The green and blue crop water requirement WATNEEDS model and its global gridded outputs44
Fetal electrocardiograms, direct and abdominal with reference heartbeat annotations44
An annotated fluorescence image dataset for training nuclear segmentation methods44
CAVD, towards better characterization of void space for ionic transport analysis44
Response2covid19, a dataset of governments’ responses to COVID-19 all around the world43
China’s greenhouse gas emissions for cropping systems from 1978–201643
CU-BEMS, smart building electricity consumption and indoor environmental sensor datasets42
A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors42
LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction42
CerebrA, registration and manual label correction of Mindboggle-101 atlas for MNI-ICBM152 template42
The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension42
Development and validation of the CHIRTS-daily quasi-global high-resolution daily temperature data set42
Benchmark maps of 33 years of secondary forest age for Brazil42
A new global dataset of bioclimatic indicators42
Experimental database of optical properties of organic compounds41
Global spatiotemporally continuous MODIS land surface temperature dataset41
Electronic healthcare records and external outcome data for hospitalized patients with heart failure41
Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing41
A multi-modal open dataset for mental-disorder analysis40
A new vector-based global river network dataset accounting for variable drainage density40
A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland40
A band-gap database for semiconducting inorganic materials calculated with hybrid functional40
lncRNAKB, a knowledgebase of tissue-specific functional annotation and trait association of long noncoding RNA40
Vectorized rooftop area data for 90 cities in China39
Worldwide continuous gap-filled MODIS land surface temperature dataset39
An integrated landscape of protein expression in human cancer39
QMugs, quantum mechanical properties of drug-like molecules39
Heidelberg colorectal data set for surgical data science in the sensor operating room39
Global land projection based on plant functional types with a 1-km resolution under socio-climatic scenarios38
Probabilistic atlas for the language network based on precision fMRI data from >800 individuals38
A global occurrence database of the Atlantic blue crab Callinectes sapidus38
Developing reliable hourly electricity demand data through screening and imputation38
16 years of topographic surveys of rip-channelled high-energy meso-macrotidal sandy beach38
A global dataset for the projected impacts of climate change on four major crops38
TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers38
A database of chlorophyll and water chemistry in freshwater lakes38
An update on global mining land use38
Discharge profile of a zinc-air flow battery at various electrolyte flow rates and discharge currents37
The United States COVID-19 Forecast Hub dataset37
Greenhouse gas emissions from municipal wastewater treatment facilities in China from 2006 to 201937
A Global Building Occupant Behavior Database37
High-resolution Digital Surface Model of the 2021 eruption deposit of Cumbre Vieja volcano, La Palma, Spain36
Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-1936
Long-term and large-scale multispecies dataset tracking population changes of common European breeding birds36
MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks36
A rasterized building footprint dataset for the United States36
The normalised Sentinel-1 Global Backscatter Model, mapping Earth’s land surface with C-band microwaves35
In vivo human whole-brain Connectom diffusion MRI dataset at 760 µm isotropic resolution35
Global gridded GDP data set consistent with the shared socioeconomic pathways35
Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types35
A multilevel carbon and water footprint dataset of food commodities35
A high-resolution climate simulation dataset for the past 540 million years34
Global forest management data for 2015 at a 100 m resolution34
An Indo-Pacific coral spawning database34
A comprehensive, multisource database for hydrometeorological modeling of 14,425 North American watersheds34
Quantum chemical benchmark databases of gold-standard dimer interaction energies34
A gene expression atlas for different kinds of stress in the mouse brain34
Benchmarking second and third-generation sequencing platforms for microbial metagenomics34
Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm34
A high-spatial-resolution dataset of human thermal stress indices over South and East Asia34
Global offshore wind turbine dataset33
Lower limb kinematic, kinetic, and EMG data from young healthy humans during walking at controlled speeds33
p3k14c, a synthetic global database of archaeological radiocarbon dates33
The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments33
VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients33
A map of the extent and year of detection of oil palm plantations in Indonesia, Malaysia and Thailand33
A relational database to identify differentially expressed genes in the endometrium and endometriosis lesions33
Unravelling the diversity of magnetotactic bacteria through analysis of open genomic databases33
The global lake area, climate, and population dataset33
A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms32
SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules32
A whole-body FDG-PET/CT Dataset with manually annotated Tumor Lesions32
Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm32
A dataset of remote-sensed Forel-Ule Index for global inland waters during 2000–201832
Genome assembly and annotation of Meloidogyne enterolobii, an emerging parthenogenetic root-knot nematode32
The Swiss data cube, analysis ready data archive using earth observations of Switzerland32
Global trends and forecasts of breast cancer incidence and deaths31
Reef Cover, a coral reef classification for global habitat mapping from remote sensing31
Publisher Correction: A global database of Holocene paleotemperature records31
Crop production and nitrogen use in European cropland and grassland 1961–201931
Monthly direct and indirect greenhouse gases emissions from household consumption in the major Japanese cities31
A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research31
Caravan - A global community dataset for large-sample hydrology30
Chest imaging representing a COVID-19 positive rural U.S. population30
A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK30
Human and economic impacts of natural disasters: can we trust the global data?30
Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images30
Global data on fertilizer use by crop and by country30
A kinematic and kinetic dataset of 18 above-knee amputees walking at various speeds30
A global 0.05° dataset for gross primary production of sunlit and shaded vegetation canopies from 1992 to 202029
A comprehensive database of active and potentially-active continental faults in Chile at 1:25,000 scale29
Projecting 1 km-grid population distributions from 2020 to 2100 globally under shared socioeconomic pathways29
A multimodal sensor dataset for continuous stress detection of nurses in a hospital29
Question-driven summarization of answers to consumer health questions29
An improved daily standardized precipitation index dataset for mainland China from 1961 to 201829
A curated dataset for data-driven turbulence modelling29
Maps of cropping patterns in China during 2015–202129
Global gridded crop harvested area, production, yield, and monthly physical area data circa 201529
Materials informatics platform with three dimensional structures, workflow and thermoelectric applications29
A real-time survey on the psychological impact of mild lockdown for COVID-19 in the Japanese population29
SMAP-HydroBlocks, a 30-m satellite-based soil moisture dataset for the conterminous US28
Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model28
A building height dataset across China in 2017 estimated by the spatially-informed approach28
Accelerometer data collected with a minimum set of wearable sensors from subjects with Parkinson’s disease28
The short-term mortality fluctuation data series, monitoring mortality shocks across time and space28
The International Bathymetric Chart of the Southern Ocean Version 228
A comprehensive spectral assay library to quantify the Escherichia coli proteome by DIA/SWATH-MS28
A compilation of experimental data on the mechanical properties and microstructural features of Ti-alloys27
The Cuban Human Brain Mapping Project, a young and middle age population-based EEG, MRI, and cognition dataset27
A global dataset for crop production under conventional tillage and no tillage systems27
An Open MRI Dataset For Multiscale Neuroscience27
The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes27
UWB-gestures, a public dataset of dynamic hand gestures acquired using impulse radar sensors27
GLOBathy, the global lakes bathymetry dataset27
Global data on earthworm abundance, biomass, diversity and corresponding environmental properties27
Global 1-km present and future hourly anthropogenic heat flux27
A database framework for rapid screening of structure-function relationships in PFAS chemistry27
Multicenter dataset of multi-shell diffusion MRI in healthy traveling adults with identical settings27
GDIS, a global dataset of geocoded disaster locations27
Dataset on SARS-CoV-2 non-pharmaceutical interventions in Brazilian municipalities27
Consensus transcriptional regulatory networks of coronavirus-infected human cells26
Emognition dataset: emotion recognition with self-reports, facial expressions, and physiology using wearables26
European primary forest database v2.026
Fault2SHA Central Apennines database and structuring active fault data for seismic hazard assessment26
Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities26
Annual dynamic dataset of global cropping intensity from 2001 to 201926
Database of Italian present-day stress indicators, IPSI 1.426
Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository26
A global map of planting years of plantations26
An improved high-quality genome assembly and annotation of Tibetan hulless barley26
Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants26
GloSEM: High-resolution global estimates of present and future soil displacement in croplands by water erosion25
An fMRI dataset in response to “The Grand Budapest Hotel”, a socially-rich, naturalistic movie25
0.040282964706421