Proceedings of the Vldb Endowment

Papers
(The median citation count of Proceedings of the Vldb Endowment is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Approximating probabilistic group steiner trees in graphs519
IsoBugView328
Cardinality Estimation for Having-Clauses196
Differentially Private Stream Processing at Scale91
QPJVis Demo: Quality-Boost Progressive Join Query Processing System82
Timestamp as a Service, Not an Oracle82
Spectrum: Speedy and Strictly-Deterministic Smart Contract Transactions for Blockchain Ledgers70
Fries69
Breathing New Life into an Old Tree: Resolving Logging Dilemma of B + -tree on Modern Computational Storage Drives65
SpaceSaving ±65
A Reproducible Tutorial on Reproducibility in Database Systems Research65
PerfGuard64
VeriBench: Analyzing the Performance of Database Systems with Verifiability61
PARQO: Penalty-Aware Robust Plan Selection in Query Optimization60
DuckDB-wasm58
DyHealth57
Galvatron54
Solver-In-The-Loop Cluster Resource Management for Database-as-a-Service52
GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage Disaggregation52
Towards Designing and Learning Piecewise Space-Filling Curves51
Algorithm and system co-design for efficient subgraph-based graph representation learning50
OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates48
G-tran48
Privacy for Free: Leveraging Local Differential Privacy Perturbed Data from Multiple Services47
Neighborhood-Based Hypergraph Core Decomposition47
Efficient Distributed Transaction Processing in Heterogeneous Networks47
Reliable community search in dynamic networks47
Accelerating recommendation system training by leveraging popular choices47
Influential Community Search over Large Heterogeneous Information Networks46
Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules45
Motiflets44
Efficient Discovery of Relaxed Functional Dependencies44
SkyStore: Cost-Optimized Object Storage Across Regions and Clouds43
LION: Fast and High-Resolution Network Kernel Density Visualization43
DoppelGanger++ in Action: A Database Replay System with Fast Dependency Graph Generation43
Demonstrating Waffle: A Self-Driving Grid Index42
PSFQ: A Blockchain-Based Privacy-Preserving and Verifiable Student Feedback Questionnaire Platform42
Exploiting the Power of Equality-Generating Dependencies in Ontological Reasoning41
POEM41
Incremental partitioning for efficient spatial data analytics41
VeLP: Vehicle Loading Plan Learning from Human Behavior in Nationwide Logistics System40
HAIChart: Human and AI Paired Visualization System40
LIDER40
DPXPlain40
DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time Series39
Efficient Non-Learning Similar Subtrajectory Search39
SingleStore-V: An Integrated Vector Database System in SingleStore39
Seiden: Revisiting Query Processing in Video Database Systems39
Approximate Queries over Concurrent Updates39
SQL Engines Excel at the Execution of Imperative Programs38
Pre-training summarization models of structured datasets for cardinality estimation38
IsoVista: Black-Box Checking Database Isolation Guarantees38
HyperBlocker: Accelerating Rule-Based Blocking in Entity Resolution Using GPUs36
Hardware-Efficient Data Imputation through DBMS Extensibility36
Making CRDTs Not So Eventual36
Improving matrix-vector multiplication via lossless grammar-compressed matrices35
Trie memtables in cassandra35
Plush35
TsQuality: Measuring Time Series Data Quality in Apache IoTDB34
TSB-UAD33
LITS: An Optimized Learned Index for Strings33
SUFF: Accelerating Subgraph Matching with Historical Data33
Databases Unbound: Querying All of the World's Bytes with AI32
A demonstration of multi-region CockroachDB32
Succinct graph representations as distance oracles32
MLP-Mixer based Masked Autoencoders are Effective, Explainable and Robust for Time Series Anomaly Detection31
Scalable Reasoning on Document Stores via Instance-Aware Query Rewriting31
Design trade-offs for a robust dynamic hybrid hash join30
Saving Money for Analytical Workloads in the Cloud30
SparkCAD30
FastMosaic in Action: A New Mosaic Operator for Array DBMSs29
Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness29
KGNav: A Knowledge Graph Navigational Visual Query System29
From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying29
Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and Improvement29
Sphinteract: Resolving Ambiguities in NL2SQL through User Interaction27
DINOMO27
Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data27
CMixing: An Efficient Coin Mixing Platform to Enhance Anonymity in Cryptocurrency Transactions27
Enriching Relations with Additional Attributes for ER27
Petabyte-Scale Row-Level Operations in Data Lakehouses26
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Query Event Logs26
Dalton26
Expanding Reverse Nearest Neighbors26
ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems25
Enhancing Accuracy for Super Spreader Identification in High-Speed Data Streams25
Optimizing machine learning inference queries with correlative proxy models25
CoroGraph: Bridging Cache Efficiency and Work Efficiency for Graph Algorithm Execution25
ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic Graphs24
Subgraph matching over graph federation24
Hercules against data series similarity search24
XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes24
PerMA-bench24
Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines24
Serving deep learning models with deduplication from relational databases24
Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data24
Federated matrix factorization with privacy guarantee24
Ember24
Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets24
Toward Quantity-of-Interest Preserving Lossy Compression for Scientific Data24
A Practical Theory of Generalization in Selectivity Learning23
HADES: Range-Filtered Private Aggregation on Public Data23
LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud Services23
Less is More: Efficient Time Series Dataset Condensation via Two-Fold Modal Matching23
Discovering Leitmotifs in Multidimensional Time Series23
Kora: A Cloud-Native Event Streaming Platform for Kafka23
ACTA: Autonomy and Coordination Task Assignment in Spatial Crowdsourcing Platforms23
Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching23
Enabling SQL-based training data debugging for federated learning23
FS-Real: A Real-World Cross-Device Federated Learning Platform23
FARGO: Fast Maximum Inner Product Search via Global Multi-Probing22
Anomaly detection in time series22
Optimal Sharding for Scalable Blockchains with Deconstructed SMR22
ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads22
CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space22
Vive la Différence: Practical Diff Testing of Stateful Applications22
Cloud data systems22
Oasis: An Optimal Disjoint Segmented Learned Range Filter22
Unleash the Power of Ellipsis: Accuracy-Enhanced Sparse Vector Technique with Exponential Noise21
Window Function Expression: Let the Self-Join Enter21
Demo of QueryBooster: Supporting Middleware-Based SQL Query Rewriting as a Service21
Minimum Strongly Connected Subgraph Collection in Dynamic Graphs21
Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification21
QuoteInspector: Gaining Insight about Social Media Discussions21
TimeCSL: Unsupervised Contrastive Learning of General Shapelets for Explorable Time Series Analysis20
Pyneapple-G: Scalable Spatial Grouping Queries20
L2chain20
CEDA: Learned Cardinality Estimation with Domain Adaptation20
Resource Management in Aurora Serverless20
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems20
Datamap-Driven Tabular Coreset Selection for Classifier Training20
GENTI: GPU-Powered Walk-Based Subgraph Extraction for Scalable Representation Learning on Dynamic Graphs20
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads20
Uldp-FL: Federated Learning with Across-Silo User-Level Differential Privacy20
A Case for Graphics-Driven Query Processing20
Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL20
Accelerating Maximal Clique Enumeration via Graph Reduction20
ResLake : Towards Minimum Job Latency and Balanced Resource Utilization in Geo-Distributed Job Scheduling19
Fast neural ranking on bipartite graph indices19
AeonG: An Efficient Built-in Temporal Support in Graph Databases19
GQL and SQL/PGQ: Theoretical Models and Expressive Power19
Selective data acquisition in the wild for model charging19
Computing Rule-Based Explanations by Leveraging Counterfactuals19
On More Efficiently and Versatilely Querying Historical k -Cores19
DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release19
Hyper-tune19
Starry19
Falcon: Advancing Asynchronous BFT Consensus for Lower Latency and Enhanced Throughput19
SCompression: Enhancing Database Knob Tuning Efficiency Through Slice-Based OLTP Workload Compression18
Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding18
Polyglot data management18
Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing18
Nuhuo: An Effective Estimation Model for Traffic Speed Histogram Imputation on A Road Network18
TranAD18
Scalable and Robust Snapshot Isolation for High-Performance Storage Engines18
PGE18
DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data18
To UDFs and Beyond: Demonstration of a Fully Decomposed Data Processor for General Data Wrangling Tasks18
Composable Data Management: An Execution Overview18
A Hierarchical Grouping Algorithm for the Multi-Vehicle Dial-a-Ride Problem18
OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster17
Skellam mixture mechanism17
ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection17
Distributed learning of fully connected neural networks using independent subnet training17
Themis: A GPU-Accelerated Relational Query Execution Engine17
Dynamic Graph Databases with Out-of-Order Updates17
ELEET: Efficient Learned Query Execution over Text and Tables17
xFraud17
YeSQL17
MiCS17
Simpler is More: Efficient Top-K Nearest Neighbors Search on Large Road Networks17
TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph Reasoning17
Efficient k NN Search in Public Transportation Networks17
Differentially Private Data Generation with Missing Data17
Lingua Manga : A Generic Large Language Model Centric System for Data Curation17
Towards distributed bitruss decomposition on bipartite graphs17
LOGER: A Learned Optimizer Towards Generating Efficient and Robust Query Execution Plans17
Tigger: A Database Proxy That Bounces with User-Bypass17
Efficient Discovery of Significant Patterns with Few-Shot Resampling17
Efficient Algorithms for Pseudoarboricity Computation in Large Static and Dynamic Graphs17
Task: An Efficient Framework for Instant Error-Tolerant Spatial Keyword Queries on Road Networks17
PRICE: A Pretrained Model for Cross-Database Cardinality Estimation16
Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility16
DBMS annihilator16
DataRinse: Semantic Transforms for Data Preparation Based on Code Mining16
TGL16
Fast approximate denial constraint discovery16
ParChain16
The case for distributed shared-memory databases with RDMA-enabled memory disaggregation16
PIM-Tree16
QTCS: Efficient Query-Centered Temporal Community Search16
Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics16
Sancus15
Detecting layout templates in complex multiregion files15
ELPIS: Graph-Based Similarity Search for Scalable Data Science15
Points-of-interest relationship inference with spatial-enriched graph neural networks15
Explaining Differentially Private Query Results with DPXPlain15
Efficient Execution of User-Defined Functions in SQL Queries15
POEM: Pattern-Oriented Explanations of Convolutional Neural Networks15
Efficient Triangle-Connected Truss Community Search in Dynamic Graphs15
AMRAS15
SecretFlow-SCQL: A Secure Collaborative Query Platform15
Angel-PTM: A Scalable and Economical Large-Scale Pre-Training System in Tencent15
ChainDash: An Ad-Hoc Blockchain Data Analytics System15
B link -hash: An Adaptive Hybrid Index for In-Memory Time-Series Databases15
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes15
ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language Models15
Designing production-friendly machine learning15
Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models15
Machine Learning for Subgraph Extraction: Methods, Applications and Challenges15
NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams15
Reimagining Deep Learning Systems through the Lens of Data Systems15
Tiresias14
MT-teql14
Decentralized crowdsourcing for human intelligence tasks with efficient on-chain cost14
Incremental Detection of Denial Constraint Violations14
ABC14
Breaking It Down: An In-Depth Study of Index Advisors14
FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework14
OFL-W3: A One-Shot Federated Learning System on Web 3.014
OpenFGL: A Comprehensive Benchmark for Federated Graph Learning14
Efficient Black-Box Checking of Snapshot Isolation in Databases14
Troubles with nulls, views from the users14
An Experimental Evaluation of Anomaly Detection in Time Series13
FB + -Tree: A Memory-Optimized B + -Tree with Latch-Free Update13
Hu-fu13
POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance13
The power of summarization in graph mining and learning13
Towards General and Efficient Online Tuning for Spark13
Mixed Covers of Keys and Functional Dependencies for Maintaining the Integrity of Data under Updates13
LANNS13
Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances13
Maximum k -Plex Search: An Alternated Reduction-and-Bound Method13
No Repetition13
Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V13
WebMILE13
SPECIAL: SynoPsis AssistEd Secure Collaborative AnaLytics13
Weakly Guided Adaptation for Robust Time Series Forecasting13
Data and AI Model Markets: Opportunities for Data and Model Sharing, Discovery, and Integration13
Demonstration of accelerating machine learning inference queries with correlative proxy models13
IncrCP: Decomposing and Orchestrating Incremental Checkpoints for Effective Recommendation Model Training13
Optimal Matrix Sketching over Sliding Windows13
Streaming Time Series Subsequence Anomaly Detection: A Glance and Focus Approach13
Exploiting Cloud Object Storage for High-Performance Analytics13
Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction13
DILI: A Distribution-Driven Learned Index13
SmartLite: A DBMS-Based Serving System for DNN Inference in Resource-Constrained Environments13
Hu-Fu13
FedTSC13
AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators13
0.059786081314087