VLDB Journal

Papers
(The median citation count of VLDB Journal is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Optimizing navigational graph queries214
HeteroStamp: leveraging heterogeneous social interactions for mobility prediction-enhanced cost-aware spatiotemporal crowdsensing76
Efficient and scalable huge embedding model training via distributed cache management63
Special issue on responsible data management and data science55
Data dependencies for query optimization: a survey52
Generating highly customizable python code for data processing with large language models51
RDFFrames: knowledge graph access for machine learning tools48
Span-reachability querying in large temporal graphs47
Tabular data synthesis with generative adversarial networks: design space and optimizations43
A design space for RDF data representations35
Lero: applying learning-to-rank in query optimizer32
Maximum and top-k diversified biclique search at scale25
Anchored coreness: efficient reinforcement of social networks24
Towards flexibility and robustness of LSM trees22
Efficient kNN query for moving objects on time-dependent road networks22
Effective entity matching with transformers21
Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification21
A graph pattern mining framework for large graphs on GPU19
Efficient and effective algorithms for densest subgraph discovery and maintenance18
F-IVM: analytics over relational databases under updates18
Reconciling tuple and attribute timestamping for temporal data warehouses17
Anytime bottom-up rule learning for large-scale knowledge graph completion16
Local dampening: differential privacy for non-numeric queries via local sensitivity16
Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)15
Picket: guarding against corrupted data in tabular data during learning and inference15
Accelerating multi-way joins on the GPU14
Correction to: Data dependencies for query optimization: a survey14
Privacy and efficiency guaranteed social subgraph matching14
ProS: data series progressive k-NN similarity search and classification with probabilistic quality guarantees13
Algorithms for the discovery of embedded functional dependencies13
To share or not to share vector registers?13
Answering reachability and K-reach queries on large graphs with label constraints13
VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition13
PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search13
Incremental discovery of denial constraints12
Optimizing RPQs over a compact graph representation12
Data distribution tailoring revisited: cost-efficient integration of representative data12
HERMES: data placement and schema optimization for enterprise knowledge bases11
When hierarchy meets 2-hop-labeling: efficient shortest distance and path queries on road networks10
DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search10
Efficient detection of multivariate correlations with different correlation measures10
Discovering approximate implicit domain orders through order dependencies10
An in-depth analysis of pre-trained embeddings for entity resolution10
Application-driven graph partitioning9
A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning9
Data distribution debugging in machine learning pipelines9
Correction to: Survey of window types for aggregation in stream processing systems9
Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytic9
A survey of multimodal event detection based on data fusion9
A model and query language for temporal graph databases9
Correction to: TurboLift: fast accuracy lifting for historical data recovery9
Survey of vector database management systems8
Information Resilience: the nexus of responsible and agile approaches to information use8
A quantitative evaluation of persistent memory hash indexes7
eRiskCom: an e-commerce risky community detection platform7
A survey on outlier explanations7
Pivot selection algorithms in metric spaces: a survey and experimental study7
PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patterns7
Continuous monitoring of moving skyline and top-k queries7
WavingSketch: an unbiased and generic sketch for finding top-k items in data streams6
(p,q)-biclique counting and enumeration for large sparse bipartite graphs6
General graph generators: experiments, analyses, and improvements6
AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting6
Eris: efficiently measuring discord in multidimensional sources6
Accelerating directed densest subgraph queries with software and hardware approaches6
An analysis of one-to-one matching algorithms for entity resolution6
Survey of window types for aggregation in stream processing systems6
Minimum motif-cut: a workload-aware RDF graph partitioning strategy5
A survey on semantic schema discovery5
Interactively discovering and ranking desired tuples by data exploration5
MDDE: multitasking distributed differential evolution for privacy-preserving database fragmentation5
Zen+: a robust NUMA-aware OLTP engine optimized for non-volatile main memory5
Efficient and robust active learning methods for interactive database exploration5
Assisted design of data science pipelines5
Reverse spatial top-k keyword queries5
PARROT: pattern-based correlation exploitation in big partitioned data series5
The full story of 1000 cores5
Efficient distributed discovery of bidirectional order dependencies5
A fractional memory-efficient approach for online continuous-time influence maximization4
AutoML in heavily constrained applications4
Optimizing LSM-based indexes for disaggregated memory4
$$\hbox {CDBTune}^{+}$$: An efficient deep reinforcement learning-based automatic cloud database tuning system4
Formal semantics and high performance in declarative machine learning using Datalog4
Accelerated butterfly counting with vertex priority on bipartite graphs4
Cardinality estimation using normalizing flow4
Distance labeling: on parallelism, compression, and ordering4
Third and Boyce–Codd normal form for property graphs4
A new distributional treatment for time series anomaly detection4
A meta-level analysis of online anomaly detectors4
Special issue: modern hardware4
Hyper-distance oracles in hypergraphs4
A survey on the evolution of stream processing systems4
Practical planning and execution of groupjoin and nested aggregates4
Hu-Fu: efficient and secure spatial queries over data federation4
A survey on deep learning approaches for text-to-SQL4
ICS-GNN$$^+$$: lightweight interactive community search via graph neural network4
Toward maintenance of hypercores in large-scale dynamic hypergraphs4
Deep entity matching with adversarial active learning3
Cross-chain deals and adversarial commerce3
Performant almost-latch-free data structures using epoch protection in more depth3
Scalable decoupling graph neural network with feature-oriented optimization3
A multi-facet analysis of BERT-based entity matching models3
MinJoin++: a fast algorithm for string similarity joins under edit distance3
A learning-based framework for spatial join processing: estimation, optimization and tuning3
Morphtree: a polymorphic main-memory learned index for dynamic workloads3
MM-DIRECT3
Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra3
Hypergraph motifs and their extensions beyond binary3
Efficient cryptanalysis of an encrypted database supporting data interoperability3
Leveraging user itinerary to improve personalized deep matching at Fliggy3
A new window Clause for SQL++3
In-database query optimization on SQL with ML predicates3
Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems2
Correction to: Internal and external memory set containment join2
xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph2
Correction to: Unsupervised and scalable subsequence anomaly detection in large data series2
Estimating simplet counts via sampling2
Fast subgraph query processing and subgraph matching via static and dynamic equivalences2
Open benchmark for filtering techniques in entity resolution2
Have query optimizers hit the wall?2
G-thinker: a general distributed framework for finding qualified subgraphs in a big graph with load balancing2
ABC of order dependencies2
Querying historical K-cores in large temporal graphs2
Sliding window-based approximate triangle counting with bounded memory usage2
Model averaging in distributed machine learning: a case study with Apache Spark2
Correction to: “Refiner: a reliable and efficient incentive-driven federated learning system powered by blockchain”2
Alfa: active learning for graph neural network-based semantic schema alignment2
Enabling space-time efficient range queries with REncoder2
HINT: a hierarchical interval index for Allen relationships2
Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pre-training2
0.06929612159729