Proceedings of the Vldb Endowment

Papers
(The median citation count of Proceedings of the Vldb Endowment is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
IsoBugView647
Timestamp as a Service, Not an Oracle414
Cardinality Estimation for Having-Clauses256
Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules121
A Reproducible Tutorial on Reproducibility in Database Systems Research120
Privacy for Free: Leveraging Local Differential Privacy Perturbed Data from Multiple Services104
QPJVis Demo: Quality-Boost Progressive Join Query Processing System96
Solver-In-The-Loop Cluster Resource Management for Database-as-a-Service90
How to Optimize SQL Queries? A Comparison Between Split, Holistic, and Hybrid Approaches77
Unraveling the Impact of Window Semantics: Optimizing Join Order for Efficient Stream Processing75
Efficient Graph Data Access for Out-of-Memory GPU Streaming Graph Processing74
Shifting Transaction Isolation on Graphs: From Systems to Data74
Relational Data Models for Genetic VCF data69
Accelerating Subgraph Matching through Fine-Grained and Powerful Equivalences68
Cloudy with a Chance of JSON64
Unify: A System For Unstructured Data Analytics63
TSB-AutoAD: Towards Automated Solutions for Time-Series Anomaly Detection61
SkyStore: Cost-Optimized Object Storage Across Regions and Clouds58
Efficient Discovery of Relaxed Functional Dependencies58
Fries56
SpaceSaving ±56
GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage Disaggregation56
DyHealth55
Spectrum: Speedy and Strictly-Deterministic Smart Contract Transactions for Blockchain Ledgers53
Efficient Distributed Transaction Processing in Heterogeneous Networks52
VeriBench: Analyzing the Performance of Database Systems with Verifiability51
OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates51
Reliable community search in dynamic networks50
Influential Community Search over Large Heterogeneous Information Networks50
Towards Designing and Learning Piecewise Space-Filling Curves50
Galvatron49
PARQO: Penalty-Aware Robust Plan Selection in Query Optimization49
Approximating probabilistic group steiner trees in graphs49
Differentially Private Stream Processing at Scale48
Algorithm and system co-design for efficient subgraph-based graph representation learning48
G-tran48
Breathing New Life into an Old Tree: Resolving Logging Dilemma of B + -tree on Modern Computational Storage Drives48
Motiflets45
Neighborhood-Based Hypergraph Core Decomposition44
DuckDB-wasm44
POEM43
LION: Fast and High-Resolution Network Kernel Density Visualization43
DoppelGanger++ in Action: A Database Replay System with Fast Dependency Graph Generation43
Hardware-Efficient Data Imputation through DBMS Extensibility42
HyperBlocker: Accelerating Rule-Based Blocking in Entity Resolution Using GPUs41
Making CRDTs Not So Eventual41
SAIL: A Voyage to Symbolic Approximation Solutions for Time-Series Analysis40
A Comprehensive Survey and Experimental Study of Learning-Based Community Search40
Demonstrating Waffle: A Self-Driving Grid Index40
Exploiting the Power of Equality-Generating Dependencies in Ontological Reasoning40
SingleStore-V: An Integrated Vector Database System in SingleStore39
Efficient Non-Learning Similar Subtrajectory Search38
Improving matrix-vector multiplication via lossless grammar-compressed matrices38
LIDER37
DPXPlain36
IsoVista: Black-Box Checking Database Isolation Guarantees35
Trie memtables in cassandra34
TsQuality: Measuring Time Series Data Quality in Apache IoTDB34
Approximate Queries over Concurrent Updates33
Bonspiel: Low Tail Latency Transactions in Geo-Distributed Databases33
Hermes: Off-the-Shelf Real-Time Transactional Analytics33
SQL Engines Excel at the Execution of Imperative Programs33
Federated Data Distribution Shift Estimation33
LogLite: Lightweight Plug-and-Play Streaming Log Compression32
SUFF: Accelerating Subgraph Matching with Historical Data31
Seiden: Revisiting Query Processing in Video Database Systems31
VeLP: Vehicle Loading Plan Learning from Human Behavior in Nationwide Logistics System31
Databases Unbound: Querying All of the World's Bytes with AI31
Cuckoo Heavy Keeper and the Balancing Act of Maintaining Heavy Hitters in Stream Processing31
Plush31
DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time Series30
PSFQ: A Blockchain-Based Privacy-Preserving and Verifiable Student Feedback Questionnaire Platform30
TSB-UAD30
LITS: An Optimized Learned Index for Strings30
Scalable Reasoning on Document Stores via Instance-Aware Query Rewriting30
HAIChart: Human and AI Paired Visualization System30
A demonstration of multi-region CockroachDB30
GalaxyWeaver: Autonomous Table-to-Graph Conversion and Schema Optimization with Large Language Models29
Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching29
Simulating a Transactional Server for Multi-Model Systems29
Instance-Optimal Acyclic Join Processing Without Regret: Engineering the Yannakakis Algorithm in Column Stores29
Saving Money for Analytical Workloads in the Cloud28
A Practical Theory of Generalization in Selectivity Learning28
SparkCAD28
HADES: Range-Filtered Private Aggregation on Public Data28
Oasis: An Optimal Disjoint Segmented Learned Range Filter28
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Query Event Logs27
Decentralized Actor Scheduling and Reference-Based Storage in Xorbits: A Native Scalable Data Science Engine27
XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes27
Vive la Différence: Practical Diff Testing of Stateful Applications27
FastMosaic in Action: A New Mosaic Operator for Array DBMSs27
FSMDTW: A Fast Index-Free Subsequence Matching Algorithm for Dynamic Time Warping27
Discovering Leitmotifs in Multidimensional Time Series26
Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data26
KGNav: A Knowledge Graph Navigational Visual Query System26
BURST: Rendering Clustering Techniques Suitable for Evolving Streams26
FS-Real: A Real-World Cross-Device Federated Learning Platform26
RICH: Real-Time Identification of Negative Cycles for High-Efficiency Arbitrage26
Enriching Relations with Additional Attributes for ER26
DINOMO25
Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines25
CMixing: An Efficient Coin Mixing Platform to Enhance Anonymity in Cryptocurrency Transactions25
From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying25
Sphinteract: Resolving Ambiguities in NL2SQL through User Interaction25
Petabyte-Scale Row-Level Operations in Data Lakehouses25
Expanding Reverse Nearest Neighbors25
Hercules against data series similarity search25
Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets24
CoroGraph: Bridging Cache Efficiency and Work Efficiency for Graph Algorithm Execution24
Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data24
Optimizing machine learning inference queries with correlative proxy models24
Dalton23
Enhancing Accuracy for Super Spreader Identification in High-Speed Data Streams23
Less is More: Efficient Time Series Dataset Condensation via Two-Fold Modal Matching23
Succinct graph representations as distance oracles23
MLP-Mixer based Masked Autoencoders are Effective, Explainable and Robust for Time Series Anomaly Detection23
FARGO: Fast Maximum Inner Product Search via Global Multi-Probing23
ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems23
VIDEX: A Disaggregated and Extensible Virtual Index for the Cloud and AI Era23
Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness23
Serving deep learning models with deduplication from relational databases23
Design trade-offs for a robust dynamic hybrid hash join23
ACTA: Autonomy and Coordination Task Assignment in Spatial Crowdsourcing Platforms22
Optimal Sharding for Scalable Blockchains with Deconstructed SMR22
ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic Graphs22
ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads22
LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud Services22
PerMA-bench22
Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and Improvement22
Toward Quantity-of-Interest Preserving Lossy Compression for Scientific Data22
Kora: A Cloud-Native Event Streaming Platform for Kafka22
CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space21
QuoteInspector: Gaining Insight about Social Media Discussions21
Demo of QueryBooster: Supporting Middleware-Based SQL Query Rewriting as a Service21
Unleash the Power of Ellipsis: Accuracy-Enhanced Sparse Vector Technique with Exponential Noise21
Cloud data systems21
Datamap-Driven Tabular Coreset Selection for Classifier Training21
A Case for Graphics-Driven Query Processing21
Anomaly detection in time series21
Window Function Expression: Let the Self-Join Enter21
Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification21
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems21
From Scale-Up to Scale-Out: PolarDB's Journey to Achieving 2 Billion tpmC20
Polyglot data management20
AeonG: An Efficient Built-in Temporal Support in Graph Databases20
SCompression: Enhancing Database Knob Tuning Efficiency Through Slice-Based OLTP Workload Compression20
Starry20
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads20
Fused Gromov-Wasserstein Alignment for Graph Edit Distance Computation and Beyond20
Accelerating Maximal Clique Enumeration via Graph Reduction20
TuskFlow: An Efficient Graph Database for Long-Running Transactions20
Falcon: Advancing Asynchronous BFT Consensus for Lower Latency and Enhanced Throughput20
On More Efficiently and Versatilely Querying Historical k -Cores20
TimeCSL: Unsupervised Contrastive Learning of General Shapelets for Explorable Time Series Analysis20
Authenticated Aggregate Queries with Boolean Range Predicates on Blockchains20
GENTI: GPU-Powered Walk-Based Subgraph Extraction for Scalable Representation Learning on Dynamic Graphs20
ResLake : Towards Minimum Job Latency and Balanced Resource Utilization in Geo-Distributed Job Scheduling20
DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data20
Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding20
Resource Management in Aurora Serverless19
Selective data acquisition in the wild for model charging19
GQL and SQL/PGQ: Theoretical Models and Expressive Power19
DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release19
Pyneapple-G: Scalable Spatial Grouping Queries19
CEDA: Learned Cardinality Estimation with Domain Adaptation19
Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL19
Hyper-tune19
Uldp-FL: Federated Learning with Across-Silo User-Level Differential Privacy19
Nuhuo: An Effective Estimation Model for Traffic Speed Histogram Imputation on A Road Network19
Minimum Strongly Connected Subgraph Collection in Dynamic Graphs19
Simpler is More: Efficient Top-K Nearest Neighbors Search on Large Road Networks18
L2chain18
Efficient Discovery of Significant Patterns with Few-Shot Resampling18
Lingua Manga : A Generic Large Language Model Centric System for Data Curation18
Efficient k NN Search in Public Transportation Networks18
TranAD18
YeSQL18
PGE18
Composable Data Management: An Execution Overview18
A Hierarchical Grouping Algorithm for the Multi-Vehicle Dial-a-Ride Problem18
Efficient Algorithms for Pseudoarboricity Computation in Large Static and Dynamic Graphs18
Computing Rule-Based Explanations by Leveraging Counterfactuals18
The case for distributed shared-memory databases with RDMA-enabled memory disaggregation17
Scalable and Robust Snapshot Isolation for High-Performance Storage Engines17
Tigger: A Database Proxy That Bounces with User-Bypass17
Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding17
Differentially Private Data Generation with Missing Data17
QTCS: Efficient Query-Centered Temporal Community Search17
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes17
Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility17
TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph Reasoning17
Towards distributed bitruss decomposition on bipartite graphs17
DataRinse: Semantic Transforms for Data Preparation Based on Code Mining17
DBMS annihilator17
Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing17
ELEET: Efficient Learned Query Execution over Text and Tables17
KEIGO: Co-Designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-Aware Storage Hierarchy17
Themis: A GPU-Accelerated Relational Query Execution Engine16
Sancus16
Efficient Triangle-Connected Truss Community Search in Dynamic Graphs16
ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language Models16
GRewriter: Practical Query Rewriting with Automatic Rule Set Expansion in GaussDB16
TGL16
OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster16
Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics16
ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection16
Dynamic Graph Databases with Out-of-Order Updates16
Skellam mixture mechanism16
Task: An Efficient Framework for Instant Error-Tolerant Spatial Keyword Queries on Road Networks16
LOGER: A Learned Optimizer Towards Generating Efficient and Robust Query Execution Plans16
To UDFs and Beyond: Demonstration of a Fully Decomposed Data Processor for General Data Wrangling Tasks16
PIM-Tree16
ELPIS: Graph-Based Similarity Search for Scalable Data Science16
Fast approximate denial constraint discovery16
Machine Learning for Graph Data Management and Query Processing16
Bridging Disciplines in Data Management Research to Solve Complex Data Problems16
PRICE: A Pretrained Model for Cross-Database Cardinality Estimation16
Streaming Time Series Subsequence Anomaly Detection: A Glance and Focus Approach15
MiCS15
Improving DBMS Scheduling Decisions with Accurate Performance Prediction on Concurrent Queries15
Efficient Black-Box Checking of Snapshot Isolation in Databases15
Beyond Shortest Paths: Node Fairness in Route Recommendation15
Design and Modular Verification of Distributed Transactions in MongoDB15
Fair Transaction Processing for Multi-Tenant Databases15
Tiresias15
FB + -Tree: A Memory-Optimized B + -Tree with Latch-Free Update15
OFL-W3: A One-Shot Federated Learning System on Web 3.015
NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams15
Access Control for Information-Theoretically Secure Data15
Towards Principled, Practical Document Database Design15
SecretFlow-SCQL: A Secure Collaborative Query Platform15
Bringing the Operational and Analytical Worlds Together with Lakebase15
ABC15
Reimagining Deep Learning Systems through the Lens of Data Systems15
B link -hash: An Adaptive Hybrid Index for In-Memory Time-Series Databases15
Decentralized crowdsourcing for human intelligence tasks with efficient on-chain cost15
Heta: Distributed Training of Heterogeneous Graph Neural Networks15
MD-MVCC: Multi-Version Concurrency Control for Schema Changes in Azure SQL Database15
Exploiting Cloud Object Storage for High-Performance Analytics15
Distributed learning of fully connected neural networks using independent subnet training15
Efficient Execution of User-Defined Functions in SQL Queries15
SmartLite: A DBMS-Based Serving System for DNN Inference in Resource-Constrained Environments14
Maximum k -Plex Search: An Alternated Reduction-and-Bound Method14
No Repetition14
Demonstration of accelerating machine learning inference queries with correlative proxy models14
WebMILE14
FedTSC14
Agile-Ant: Self-Managing Distributed Cache Management for Cost Optimization of Big Data Applications14
Efficient Cost Modeling of Space-Filling Curves14
Optimal Matrix Sketching over Sliding Windows14
0.21788620948792