IEEE Transactions on Parallel and Distributed Systems

Papers
(The TQCC of IEEE Transactions on Parallel and Distributed Systems is 13. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Tsinghua University303
Jdebug: A Fast, Non-Intrusive and Scalable Fault Locating Tool for Ten-Million-Scale Parallel Applications233
Design and Implementation of 2D Convolution on x86/x64 Processors227
Replicated Versioned Data Structures for Wide-Area Distributed Systems176
A Point Cloud Video Recognition Acceleration Framework Based on Tempo-Spatial Information170
HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations154
Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach152
H5Intent: Autotuning HDF5 With User Intent109
An Efficient Bottleneck Planes Exclusion Method for Reconfiguring 3D VLSI Arrays109
GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds95
IRHunter: Universal Detection of Instruction Reordering Vulnerabilities for Enhanced Concurrency in Distributed and Parallel Systems94
AWB+-Tree: A Novel Width-Based Index Structure Supporting Hybrid Matching for Large-Scale Content-Based Pub/Sub Systems90
Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization88
QoS-Aware Scheduling of Remote Rendering for Interactive Multimedia Applications in Edge Computing88
Building High-throughput Neural Architecture Search Workflows via a Decoupled Fitness Prediction Engine86
A Memory-Constraint-Aware List Scheduling Algorithm for Memory-Constraint Heterogeneous Muti-Processor System85
Enabling Large Scale Simulations for Particle Accelerators83
On the Message Complexity of Fault-Tolerant Computation: Leader Election and Agreement82
Online Container Caching for IoT Data Processing in Serverless Edge Computing82
EdgeTB: A Hybrid Testbed for Distributed Machine Learning at the Edge With High Fidelity81
STR: Hybrid Tensor Re-Generation to Break Memory Wall for DNN Training78
Mapping Large-Scale Spiking Neural Network on Arbitrary Meshed Neuromorphic Hardware75
Coflow Scheduling in Data Centers: Routing and Bandwidth Allocation74
Federated Learning With Nesterov Accelerated Gradient74
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From University of Washington73
Graph-Centric Performance Analysis for Large-Scale Parallel Applications70
Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization68
GreenFlow: A Carbon-Efficient Scheduler for Deep Learning Workloads68
A Pessimistic Fault Diagnosability of Large-Scale Connected Networks via Extra Connectivity68
Accelerating Data Delivery of Latency-Sensitive Applications in Container Overlay Network65
Building Accurate and Interpretable Online Classifiers on Edge Devices64
Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment63
Agile Cache Replacement in Edge Computing via Offline-Online Deep Reinforcement Learning63
Asynchronous Algorithms for Decentralized Resource Allocation Over Directed Networks61
Multi-Swarm Co-Evolution Based Hybrid Intelligent Optimization for Bi-Objective Multi-Workflow Scheduling in the Cloud61
LB-Chain: Load-Balanced and Low-Latency Blockchain Sharding via Account Migration60
DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV60
Tag-Sharer-Fusion Directory: A Scalable Coherence Directory With Flexible Entry Formats57
High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms57
Improved MPC Algorithms for Edit Distance and Ulam Distance57
BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription56
CiMBA: Accelerating Genome Sequencing Through On-Device Basecalling via Compute-in-Memory56
Efficient and Automated Deployment Architecture for OpenStack in TianHe SuperComputing Environment56
Improving the Scalability of GPU Synchronization Primitives55
Cannikin: No Lagger of SLO in Concurrent Multiple LoRA LLM Serving53
AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems52
RHINO: An Efficient Serverless Container System for Small-Scale HPC Applications52
A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-Acceleration51
Coordinating Fast Concurrency Adapting With Autoscaling for SLO-Oriented Web Applications50
Securing Fine-Grained Data Sharing and Erasure in Outsourced Storage Systems49
Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning49
Libfork: Portable Continuation-Stealing With Stackless Coroutines48
A Lightweight and Fine-Grained Ciphertext Search Scheme for Big Data Assisted by Proxy Servers48
Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator Compilation46
Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS Targets45
AIDTN: Towards a Real-Time AI Optimized DTN System With NVMeoF45
Identifying Degree and Sources of Non-Determinism in MPI Applications Via Graph Kernels45
SSRAID: A Stripe-Queued and Stripe-Threaded Merging I/O Strategy to Improve Write Performance of Serial Interface SSD RAID44
iBalancer: Load-Aware in-Server Flow Scheduling for Sub-Millisecond Tail Latency42
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From the University of Texas at Austin42
2024 Reviewers List*41
Decentralised Data Quality Control in Ground Truth Production for Autonomic Decisions41
Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning40
HashCache: Accelerating Serverless Computing by Skipping Duplicated Function Execution40
Scalable, Confidential and Survivable Software Updates39
Fine-Grained Performance and Cost Modeling and Optimization for FaaS Applications39
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra39
Congestion Control for Datacenter Networks: A Control-Theoretic Approach38
vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training38
Efficient Distributed Approaches to Core Maintenance on Large Dynamic Graphs38
Hierarchical Federated Learning With Momentum Acceleration in Multi-Tier Networks38
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication37
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation37
Trusted Model Aggregation With Zero-Knowledge Proofs in Federated Learning37
FedVeca: Federated Vectorized Averaging on Non-IID Data With Adaptive Bi-Directional Global Objective36
HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems36
SEMSO: A Secure and Efficient Multi-Data Source Blockchain Oracle36
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From Tsinghua University35
Optimal Convex Hull Formation on a Grid by Asynchronous Robots With Lights35
DePo: Dynamically Offload Expensive Event Processing to the Edge of Cyber-Physical Systems35
SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor35
Static Algorithm Allocation with Duplication in Robotic Network Cloud Systems34
From Deterioration to Acceleration: A Calibration Approach to Rehabilitating Step Asynchronism in Federated Optimization34
GML: Efficiently Auto-Tuning Flink's Configurations Via Guided Machine Learning34
HSA-Net: Hidden-State-Aware Networks for High-Precision QoS Prediction33
Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output Activation33
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging33
An Efficient Algorithm for Hamiltonian Path Embedding of $k$-Ary $n$-Cubes under the Partitioned Edge Fault Model33
A Survey of Storage Systems in the RDMA Era32
Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap32
EESaver: Saving Energy Dynamically for Green Multi-Access Edge Computing32
HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge32
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems31
Deep Reinforcement Learning for Load-Balancing Aware Network Control in IoT Edge Systems31
A Framework for Mapping DRL Algorithms With Prioritized Replay Buffer Onto Heterogeneous Platforms31
Joint Coverage-Reliability for Budgeted Edge Application Deployment in Mobile Edge Computing Environment31
Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous Clusters31
VCSR: An Efficient GPU Memory-Aware Sparse Format31
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism30
Cost-Efficient Server Configuration and Placement for Mobile Edge Computing30
Bandwidth-Aware Scheduling Repair Techniques in Erasure-Coded Clusters: Design and Analysis30
Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs30
CIA: A Collaborative Integrity Auditing Scheme for Cloud Data With Multi-Replica on Multi-Cloud Storage Providers29
Understanding the Impact of Data Staging for Coupled Scientific Workflows28
Blockchain Assisted Decentralized Federated Learning (BLADE-FL): Performance Analysis and Resource Allocation28
Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions28
Accelerated Information Dissemination for Replica Selection in Distributed Key-Value Store Systems28
Microservice Deployment in Edge Computing Based on Deep Q Learning28
Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems28
Timed Loops for Distributed Storage in Wireless Networks27
Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs27
Predicting Throughput of Distributed Stochastic Gradient Descent27
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Clemson University27
Graphite: Hardware-Aware GNN Reshaping for Acceleration With GPU Tensor Cores26
Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints26
Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous Clouds26
NetSHa: In-Network Acceleration of LSH-Based Distributed Search26
HI-Kyber: A Novel High-Performance Implementation Scheme of Kyber Based on GPU26
Necessary Feasibility Analysis for Mixed-Criticality Real-Time Embedded Systems26
Toward Load-Balanced Redundancy Transitioning for Erasure-Coded Storage26
A Practical Framework for Secure Document Retrieval in Encrypted Cloud File Systems26
Improving Fairness for SSD Devices through DRAM Over-Provisioning Cache Management26
Distributed Evolution Strategies With Multi-Level Learning for Large-Scale Black-Box Optimization26
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs25
Dynamic GPU Energy Optimization for Machine Learning Training Workloads25
Revisiting PM-Based B-Tree With Persistent CPU Cache25
Parallel and Distributed Bayesian Network Structure Learning25
Deadline and Reliability Aware Multiserver Configuration Optimization for Maximizing Profit25
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization25
Doing More with Less: Balancing Probing Costs and Task Offloading Efficiency at the Network Edge24
LOCUS: User-Perceived Delay-Aware Service Placement and User Allocation in MEC Environment24
Efficient Function Queryable and Privacy Preserving Data Aggregation Scheme in Smart Grid24
A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training24
FedICT: Federated Multi-Task Distillation for Multi-Access Edge Computing24
Taking Advantage of the Mistakes: Rethinking Clustered Federated Learning for IoT Anomaly Detection24
COFFEE: Cross-Layer Optimization for Fast and Efficient Executions of Sinkhorn-Knopp Algorithm on HPC Systems23
UFC2: User-Friendly Collaborative Cloud23
gIM: GPU Accelerated RIS-Based Influence Maximization Algorithm23
Monte: SFCs Migration Scheme in the Distributed Programmable Data Plane23
CERT-DF: A Computing-Efficient and Robust Distributed Deep Forest Framework With Low Communication Overhead23
Content Collaborative Caching Strategy in the Edge Maintenance of Communication Network: A Joint Download Delay and Energy Consumption Method23
Collaboration in Federated Learning With Differential Privacy: A Stackelberg Game Analysis22
Distributed Approaches to Butterfly Analysis on Large Dynamic Bipartite Graphs22
Accelerating Bayesian Neural Networks via Algorithmic and Hardware Optimizations22
On the Analysis of Cache Invalidation With LRU Replacement22
Learning to Schedule Multi-Server Jobs With Fluctuated Processing Speeds22
Propagation Pattern for Moment Representation of the Lattice Boltzmann Method22
Dap-FL: Federated Learning Flourishes by Adaptive Tuning and Secure Aggregation22
Safe Multi-Agent Deep Reinforcement Learning for the Management of Autonomous Connected Vehicles at Future Intersections22
ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms22
Estuary: A Low Cross-Shard Blockchain Sharding Protocol Based on State Splitting22
Ocelot: An Interactive, Efficient Distributed Compression-As-a-Service Platform With Optimized Data Compression Techniques22
Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion22
Scaling Poisson Solvers on Many Cores via MMEwald21
The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and Availability21
MRCN: Throughput-Oriented Multicast Routing for Customized Network-on-Chips21
Increasing the Efficiency of Massively Parallel Sparse Matrix-Matrix Multiplication in First-Principles Calculation on the New-Generation Sunway Supercomputer21
VQL: Efficient and Verifiable Cloud Query Services for Blockchain Systems21
Online Pricing and Trading of Private Data in Correlated Queries21
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From Peking University21
Parallel Multi Objective Shortest Path Update Algorithm in Large Dynamic Networks21
Cost-Effective Server Deployment for Multi-Access Edge Networks: A Cooperative Scheme21
On Mixing Eventual and Strong Consistency: Acute Cloud Types21
LOFS: A Lightweight Online File Storage Strategy for Effective Data Deduplication at Network Edge21
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From ShanghaiTech University21
RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory Systems21
On Model Transmission Strategies in Federated Learning With Lossy Communications21
YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve21
Gamora: Learning-Based Buffer-Aware Preloading for Adaptive Short Video Streaming20
Improved Methods of Task Assignment and Resource Allocation With Preemption in Edge Computing Systems20
Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution20
Floating Point Calculation of the Cube Function on FPGAs20
FedTune-SGM: A Stackelberg-Driven Personalized Federated Learning Strategy for Edge Networks20
Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments20
DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious Multicore20
CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs20
NDP: Network Division Positioning for Irregular Multi-Hop Networks20
Accurate Differentially Private Deep Learning on the Edge20
IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous Computing20
Guest Editorial:Special Section on SC22 Student Cluster Competition20
AdaptChain: Adaptive Scaling Blockchain With Transaction Deduplication19
Efficient Virtual Network Embedding of Cloud-Based Data Center Networks into Optical Networks19
An Unequal Caching Strategy for Shared-Memory Graph Analytics19
APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators19
Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading19
Accelerating Content-Defined Chunking for Data Deduplication Based on Speculative Jump19
PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation19
Auto-GNAS: A Parallel Graph Neural Architecture Search Framework19
GPABE: GPU-Based Parallelization Framework for Attribute-Based Encryption Schemes19
Redundancy-Free and Load-Balanced TGNN Training With Hierarchical Pipeline Parallelism18
Adaptive Vertical Federated Learning on Unbalanced Features18
SelectiveEC: Towards Balanced Recovery Load on Erasure-Coded Storage Systems18
Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters18
VeriML: Enabling Integrity Assurances and Fair Payments for Machine Learning as a Service18
Reliability-Aware Multi-Objective Memetic Algorithm for Workflow Scheduling Problem in Multi-Cloud System18
CNNPC: End-Edge-Cloud Collaborative CNN Inference With Joint Model Partition and Compression18
High Performance OpenCL-Based GEMM Kernel Auto-Tuned by Bayesian Optimization18
RLPTO: A Reinforcement Learning-Based Performance-Time Optimized Task and Resource Scheduling Mechanism for Distributed Machine Learning18
Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility17
Frequency-Domain Inference Acceleration for Convolutional Neural Networks Using ReRAMs17
Online Elastic Resource Provisioning With QoS Guarantee in Container-Based Cloud Computing17
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From ETH Zürich17
Synergistically Rebalancing the EDP of Container-Based Parallel Applications17
Cooperative Scheduling Schemes for Explainable DNN Acceleration in Satellite Image Analysis and Retraining17
FLUPS - A Flexible and Performant Massively Parallel Fourier Transform Library17
PhaST: Hierarchical Concurrent Log-Free Skip List for Persistent Memory17
Multi-Tier GPU Virtualization for Deep Learning in Cloud-Edge Systems16
Near-Lossless MPI Tracing and Proxy Application Autogeneration16
TODG: Distributed Task Offloading With Delay Guarantees for Edge Computing16
Harnessing the Potential of Function-Reuse in Multimedia Cloud Systems16
FedMDS: An Efficient Model Discrepancy-Aware Semi-Asynchronous Clustered Federated Learning Framework16
Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value16
Privacy Preserving Task Push in Spatial Crowdsourcing With Unknown Popularity16
Beyond Belady to Attain a Seemingly Unattainable Byte Miss Ratio for Content Delivery Networks16
Mobility-Aware Offloading and Resource Allocation for Distributed Services Collaboration16
CPLNS: Cooperative Parallel Large Neighborhood Search for Large-Scale Multi-Agent Path Finding16
Shuffle Differential Private Data Aggregation for Random Population16
SLO-Aware Function Placement for Serverless Workflows With Layer-Wise Memory Sharing15
A Native Tensor–Vector Multiplication Algorithm for High Performance Computing15
The Doctrine of MEAN: Realizing Deduplication Storage at Unreliable Edge15
MemTunnel: a CXL-based Rack-Scale Host Memory Pooling Architecture for Cloud Service15
Evaluating Data Redistribution in PaRSEC15
Faster-BNI: Fast Parallel Exact Inference on Bayesian Networks15
A Distributed Network-Based Runtime Verification of Full Regular Temporal Properties15
Guest Editorial15
Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers15
Deep Neural Network Training With Distributed K-FAC14
Accelerating Communication-Efficient Federated Multi-Task Learning With Personalization and Fairness14
FEUAGame: Fairness-Aware Edge User Allocation for App Vendors14
Node Essentiality Assessment and Distributed Collaborative Virtual Network Embedding in Datacenters14
Loci: Federated Continual Learning of Heterogeneous Tasks at Edge14
Toward Materials Genome Big-Data: A Blockchain-Based Secure Storage and Efficient Retrieval Method14
Energy Efficient and Multi-Resource Optimization for Virtual Machine Placement by Improving MOEA/D14
Hypergraph-Based Numerical Neural-Like P Systems for Medical Image Segmentation14
A Non-Intrusive Multi-objective Task Scheduling Method for JointCloud Environment14
ABSE: Adaptive Baseline Score-Based Election for Leader-Based BFT Systems14
Towards Revenue-Driven Multi-User Online Task Offloading in Edge Computing14
An Efficient Speculative Federated Tree Learning System With a Lightweight NN-Based Predictor14
Scalable Deep Reinforcement Learning-Based Online Routing for Multi-Type Service Requirements14
A Resource-Efficient Predictive Resource Provisioning System in Cloud Systems14
Rethinking Virtual Machines Live Migration for Memory Disaggregation14
OneOS: Distributed Operating System for the Edge-to-Cloud Continuum13
Monodirectional Evolutional Symport Tissue P Systems With Promoters and Cell Division13
Editor's Note13
FedLoRE: Communication-Efficient and Personalized Edge Intelligence Framework via Federated Low-Rank Estimation13
Landlord: Coordinating Dynamic Software Environments to Reduce Container Sprawl13
PeakFS: An Ultra-High Performance Parallel File System via Computing-Network-Storage Co-Optimization for HPC Applications13
Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic Approach13
Rollback-Free Recovery for a High Performance Dense Linear Solver With Reduced Memory Footprint13
Near-Zero Downtime Recovery From Transient-Error-Induced Crashes13
0.12638378143311