IEEE Transactions on Parallel and Distributed Systems

Papers
(The TQCC of IEEE Transactions on Parallel and Distributed Systems is 14. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Tsinghua University379
Jdebug: A Fast, Non-Intrusive and Scalable Fault Locating Tool for Ten-Million-Scale Parallel Applications261
H5Intent: Autotuning HDF5 With User Intent252
Replicated Versioned Data Structures for Wide-Area Distributed Systems201
A Point Cloud Video Recognition Acceleration Framework Based on Tempo-Spatial Information190
Online Container Caching for IoT Data Processing in Serverless Edge Computing174
Optimizing Data Locality by Integrating Intermediate Data Partitioning and Reduce Task Scheduling in Spark Framework174
Enabling Large Scale Simulations for Particle Accelerators124
A Memory-Constraint-Aware List Scheduling Algorithm for Memory-Constraint Heterogeneous Muti-Processor System120
An Efficient Bottleneck Planes Exclusion Method for Reconfiguring 3D VLSI Arrays120
Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization113
Design and Implementation of 2D Convolution on x86/x64 Processors112
AWB+-Tree: A Novel Width-Based Index Structure Supporting Hybrid Matching for Large-Scale Content-Based Pub/Sub Systems103
IRHunter: Universal Detection of Instruction Reordering Vulnerabilities for Enhanced Concurrency in Distributed and Parallel Systems97
Building High-throughput Neural Architecture Search Workflows via a Decoupled Fitness Prediction Engine96
STR: Hybrid Tensor Re-Generation to Break Memory Wall for DNN Training96
On the Message Complexity of Fault-Tolerant Computation: Leader Election and Agreement94
Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach94
EdgeTB: A Hybrid Testbed for Distributed Machine Learning at the Edge With High Fidelity92
Federated Learning With Nesterov Accelerated Gradient88
Mapping Large-Scale Spiking Neural Network on Arbitrary Meshed Neuromorphic Hardware86
Large-Scale Neural Network Quantum States Calculation for Quantum Chemistry on a New Sunway Supercomputer86
Fully Decentralized Data Distribution for Large-Scale HPC Systems82
QoS-Aware Scheduling of Remote Rendering for Interactive Multimedia Applications in Edge Computing82
HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations79
GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds78
RHINO: An Efficient Serverless Container System for Small-Scale HPC Applications76
Graph-Centric Performance Analysis for Large-Scale Parallel Applications75
BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription75
Tag-Sharer-Fusion Directory: A Scalable Coherence Directory With Flexible Entry Formats74
Cannikin: No Lagger of SLO in Concurrent Multiple LoRA LLM Serving73
PreTrans: Enabling Efficient CGRA Multi-Task Context Switch Through Config Pre-Mapping and Data Transceiving70
CiMBA: Accelerating Genome Sequencing Through On-Device Basecalling via Compute-in-Memory69
Efficient and Automated Deployment Architecture for OpenStack in TianHe SuperComputing Environment67
AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems66
High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms64
A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-Acceleration64
Scalable Hybrid Learning Techniques for Scientific Data Compression61
DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV61
Coordinating Fast Concurrency Adapting With Autoscaling for SLO-Oriented Web Applications60
GreenFlow: A Carbon-Efficient Scheduler for Deep Learning Workloads59
PHIDE: A Parallel Hybrid Direct–Iterative Eigensolver for Hermitian Eigenvalue Problems59
A Pessimistic Fault Diagnosability of Large-Scale Connected Networks via Extra Connectivity59
Building Accurate and Interpretable Online Classifiers on Edge Devices59
Multi-Swarm Co-Evolution Based Hybrid Intelligent Optimization for Bi-Objective Multi-Workflow Scheduling in the Cloud58
Improving the Scalability of GPU Synchronization Primitives58
Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning57
Securing Fine-Grained Data Sharing and Erasure in Outsourced Storage Systems56
Accelerating Data Delivery of Latency-Sensitive Applications in Container Overlay Network55
Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization55
Agile Cache Replacement in Edge Computing via Offline-Online Deep Reinforcement Learning52
Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment52
LB-Chain: Load-Balanced and Low-Latency Blockchain Sharding via Account Migration52
Asynchronous Algorithms for Decentralized Resource Allocation Over Directed Networks51
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation50
A Lightweight and Fine-Grained Ciphertext Search Scheme for Big Data Assisted by Proxy Servers50
Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS Targets49
AIDTN: Towards a Real-Time AI Optimized DTN System With NVMeoF49
HashCache: Accelerating Serverless Computing by Skipping Duplicated Function Execution48
SSRAID: A Stripe-Queued and Stripe-Threaded Merging I/O Strategy to Improve Write Performance of Serial Interface SSD RAID48
DynPipe: Toward Dynamic End-to-End Pipeline Parallelism for Interference-Aware DNN Training47
iBalancer: Load-Aware in-Server Flow Scheduling for Sub-Millisecond Tail Latency47
Fine-Grained Performance and Cost Modeling and Optimization for FaaS Applications47
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From the University of Texas at Austin46
2024 Reviewers List*45
Decentralised Data Quality Control in Ground Truth Production for Autonomic Decisions45
Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator Compilation44
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication43
SEMSO: A Secure and Efficient Multi-Data Source Blockchain Oracle43
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra42
Efficient Distributed Approaches to Core Maintenance on Large Dynamic Graphs42
Distributed Discrete Morse Sandwich: Efficient Computation of Persistence Diagrams for Massive Scalar Data41
Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning41
vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training41
Scalable, Confidential and Survivable Software Updates41
Congestion Control for Datacenter Networks: A Control-Theoretic Approach41
Trusted Model Aggregation With Zero-Knowledge Proofs in Federated Learning40
Libfork: Portable Continuation-Stealing With Stackless Coroutines40
HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems39
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From Tsinghua University39
FedVeca: Federated Vectorized Averaging on Non-IID Data With Adaptive Bi-Directional Global Objective39
Hierarchical Federated Learning With Momentum Acceleration in Multi-Tier Networks39
From Deterioration to Acceleration: A Calibration Approach to Rehabilitating Step Asynchronism in Federated Optimization38
Static Algorithm Allocation with Duplication in Robotic Network Cloud Systems38
Optimal Convex Hull Formation on a Grid by Asynchronous Robots With Lights38
A Survey of Storage Systems in the RDMA Era37
HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge36
Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous Clusters36
A Framework for Mapping DRL Algorithms With Prioritized Replay Buffer Onto Heterogeneous Platforms36
VCSR: An Efficient GPU Memory-Aware Sparse Format36
Bandwidth-Aware Scheduling Repair Techniques in Erasure-Coded Clusters: Design and Analysis36
Chameleon: An Efficient FHE Scheme Switching Acceleration on GPUs35
An Efficient Algorithm for Hamiltonian Path Embedding of $k$-Ary $n$-Cubes under the Partitioned Edge Fault Model35
Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap34
SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor34
EESaver: Saving Energy Dynamically for Green Multi-Access Edge Computing34
DePo: Dynamically Offload Expensive Event Processing to the Edge of Cyber-Physical Systems34
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems33
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging33
HSA-Net: Hidden-State-Aware Networks for High-Precision QoS Prediction33
Joint Coverage-Reliability for Budgeted Edge Application Deployment in Mobile Edge Computing Environment33
CIA: A Collaborative Integrity Auditing Scheme for Cloud Data With Multi-Replica on Multi-Cloud Storage Providers32
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism32
Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output Activation32
Cost-Efficient Server Configuration and Placement for Mobile Edge Computing32
Optimizing Management of Persistent Data Structures in High-Performance Analytics31
Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems31
Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs31
Deep Reinforcement Learning for Load-Balancing Aware Network Control in IoT Edge Systems31
Blockchain Assisted Decentralized Federated Learning (BLADE-FL): Performance Analysis and Resource Allocation31
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Clemson University30
Timed Loops for Distributed Storage in Wireless Networks30
Distributed Evolution Strategies With Multi-Level Learning for Large-Scale Black-Box Optimization30
Doing More With Less: Balancing Probing Costs and Task Offloading Efficiency At the Network Edge30
Necessary Feasibility Analysis for Mixed-Criticality Real-Time Embedded Systems29
Predicting Throughput of Distributed Stochastic Gradient Descent29
Taking Advantage of the Mistakes: Rethinking Clustered Federated Learning for IoT Anomaly Detection29
Parallel and Distributed Bayesian Network Structure Learning29
NetSHa: In-Network Acceleration of LSH-Based Distributed Search29
Accelerated Information Dissemination for Replica Selection in Distributed Key-Value Store Systems29
Revisiting PM-Based B-Tree With Persistent CPU Cache29
HI-Kyber: A Novel High-Performance Implementation Scheme of Kyber Based on GPU29
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs29
A Practical Framework for Secure Document Retrieval in Encrypted Cloud File Systems28
Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints28
Toward Load-Balanced Redundancy Transitioning for Erasure-Coded Storage28
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization28
Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions28
Understanding the Impact of Data Staging for Coupled Scientific Workflows27
Graphite: Hardware-Aware GNN Reshaping for Acceleration With GPU Tensor Cores27
Improving Fairness for SSD Devices through DRAM Over-Provisioning Cache Management27
Deadline and Reliability Aware Multiserver Configuration Optimization for Maximizing Profit27
FEditor: Consecutive Task Placement With Adjustable Shapes Using FPGA State Frames27
Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous Clouds27
Microservice Deployment in Edge Computing Based on Deep Q Learning26
LOCUS: User-Perceived Delay-Aware Service Placement and User Allocation in MEC Environment26
Efficient Function Queryable and Privacy Preserving Data Aggregation Scheme in Smart Grid26
Dynamic GPU Energy Optimization for Machine Learning Training Workloads26
FedICT: Federated Multi-Task Distillation for Multi-Access Edge Computing26
A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training26
Propagation Pattern for Moment Representation of the Lattice Boltzmann Method25
Learning to Schedule Multi-Server Jobs With Fluctuated Processing Speeds25
EDTC: Exact Triangle Counting for Dynamic Graphs on GPU25
RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory Systems25
Ocelot: An Interactive, Efficient Distributed Compression-As-a-Service Platform With Optimized Data Compression Techniques25
Safe Multi-Agent Deep Reinforcement Learning for the Management of Autonomous Connected Vehicles at Future Intersections25
CERT-DF: A Computing-Efficient and Robust Distributed Deep Forest Framework With Low Communication Overhead25
UFC2: User-Friendly Collaborative Cloud25
Increasing the Efficiency of Massively Parallel Sparse Matrix-Matrix Multiplication in First-Principles Calculation on the New-Generation Sunway Supercomputer25
Critique of “Data Flow Lifecycles for Optimizing Workflow Coordination” by SCC Team From National Tsing Hua University25
Distributed Approaches to Butterfly Analysis on Large Dynamic Bipartite Graphs25
Multi-Agent Collaboration for Workflow Task Offloading in End-Edge-Cloud Environments Using Deep Reinforcement Learning25
The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and Availability24
Accelerating Bayesian Neural Networks via Algorithmic and Hardware Optimizations24
COFFEE: Cross-Layer Optimization for Fast and Efficient Executions of Sinkhorn-Knopp Algorithm on HPC Systems24
On Model Transmission Strategies in Federated Learning With Lossy Communications24
Content Collaborative Caching Strategy in the Edge Maintenance of Communication Network: A Joint Download Delay and Energy Consumption Method24
On the Analysis of Cache Invalidation With LRU Replacement24
Dap-FL: Federated Learning Flourishes by Adaptive Tuning and Secure Aggregation24
Monte: SFCs Migration Scheme in the Distributed Programmable Data Plane24
VQL: Efficient and Verifiable Cloud Query Services for Blockchain Systems24
Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion23
Online Pricing and Trading of Private Data in Correlated Queries23
MRCN: Throughput-Oriented Multicast Routing for Customized Network-on-Chips23
Collaboration in Federated Learning With Differential Privacy: A Stackelberg Game Analysis23
Parallel Multi Objective Shortest Path Update Algorithm in Large Dynamic Networks23
Cost-Effective Server Deployment for Multi-Access Edge Networks: A Cooperative Scheme23
On Mixing Eventual and Strong Consistency: Acute Cloud Types23
Estuary: A Low Cross-Shard Blockchain Sharding Protocol Based on State Splitting23
High Performance OpenCL-Based GEMM Kernel Auto-Tuned by Bayesian Optimization22
CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs22
RPCE: Dynamic Data Replicas Placement Management by Cloud and Edge Collaboration22
Scaling Poisson Solvers on Many Cores via MMEwald22
Gamora: Learning-Based Buffer-Aware Preloading for Adaptive Short Video Streaming22
Floating Point Calculation of the Cube Function on FPGAs22
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From ShanghaiTech University22
NDP: Network Division Positioning for Irregular Multi-Hop Networks22
Guest Editorial:Special Section on SC22 Student Cluster Competition22
Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution21
RLPTO: A Reinforcement Learning-Based Performance-Time Optimized Task and Resource Scheduling Mechanism for Distributed Machine Learning21
DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious Multicore21
GPABE: GPU-Based Parallelization Framework for Attribute-Based Encryption Schemes21
Auto-GNAS: A Parallel Graph Neural Architecture Search Framework21
Cache Partition Management for Improving Fairness and I/O Responsiveness in NVMe SSDs20
IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous Computing20
Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments20
Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading20
Reliability-Aware Multi-Objective Memetic Algorithm for Workflow Scheduling Problem in Multi-Cloud System20
FedTune-SGM: A Stackelberg-Driven Personalized Federated Learning Strategy for Edge Networks20
LOFS: A Lightweight Online File Storage Strategy for Effective Data Deduplication at Network Edge20
Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters19
MUCVR: Edge Computing-Enabled High-Quality Multi-User Collaboration for Interactive MVR19
CNNPC: End-Edge-Cloud Collaborative CNN Inference With Joint Model Partition and Compression19
APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators19
PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation19
Adaptive Vertical Federated Learning on Unbalanced Features19
Cost-Effective Empirical Performance Modeling18
m2LLM: A Multi-Dimensional Optimization Framework for LLM Inference on Mobile Devices18
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From ETH Zürich18
AdaptChain: Adaptive Scaling Blockchain With Transaction Deduplication18
Improved Methods of Task Assignment and Resource Allocation With Preemption in Edge Computing Systems18
An Unequal Caching Strategy for Shared-Memory Graph Analytics18
Frequency-Domain Inference Acceleration for Convolutional Neural Networks Using ReRAMs18
SelectiveEC: Towards Balanced Recovery Load on Erasure-Coded Storage Systems18
Accelerating Content-Defined Chunking for Data Deduplication Based on Speculative Jump18
Cooperative Scheduling Schemes for Explainable DNN Acceleration in Satellite Image Analysis and Retraining18
Multi-Tier GPU Virtualization for Deep Learning in Cloud-Edge Systems18
FLUPS - A Flexible and Performant Massively Parallel Fourier Transform Library17
Near-Lossless MPI Tracing and Proxy Application Autogeneration17
Privacy Preserving Task Push in Spatial Crowdsourcing With Unknown Popularity17
CPLNS: Cooperative Parallel Large Neighborhood Search for Large-Scale Multi-Agent Path Finding17
Beyond Belady to Attain a Seemingly Unattainable Byte Miss Ratio for Content Delivery Networks17
PhaST: Hierarchical Concurrent Log-Free Skip List for Persistent Memory17
MemTunnel: A CXL-Based Rack-Scale Host Memory Pooling Architecture for Cloud Service17
Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value17
Harnessing the Potential of Function-Reuse in Multimedia Cloud Systems17
Redundancy-Free and Load-Balanced TGNN Training With Hierarchical Pipeline Parallelism17
Shuffle Differential Private Data Aggregation for Random Population17
The Doctrine of MEAN: Realizing Deduplication Storage at Unreliable Edge17
Mobility-Aware Offloading and Resource Allocation for Distributed Services Collaboration17
Online Elastic Resource Provisioning With QoS Guarantee in Container-Based Cloud Computing17
Synergistically Rebalancing the EDP of Container-Based Parallel Applications17
Loci: Federated Continual Learning of Heterogeneous Tasks at Edge16
TODG: Distributed Task Offloading With Delay Guarantees for Edge Computing16
Accelerating Half-Precision Seismic Simulation on Neural Processing Unit16
A Distributed Network-Based Runtime Verification of Full Regular Temporal Properties16
Accelerating Communication-Efficient Federated Multi-Task Learning With Personalization and Fairness16
FedMDS: An Efficient Model Discrepancy-Aware Semi-Asynchronous Clustered Federated Learning Framework16
Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers16
Scheduling Fork-Joins With Communication Delays and Equal Processing Times on Heterogeneous Processors16
Rethinking Virtual Machines Live Migration for Memory Disaggregation16
Guest Editorial16
A Non-Intrusive Multi-objective Task Scheduling Method for JointCloud Environment16
GAP-DCCS: A Generic Acceleration Paradigm for Data-Intensive Applications With Efficient Data Compression and Caching Strategy Over CPU-GPU Clusters16
Accelerating Restarted GMRES With Mixed Precision Arithmetic16
SLO-Aware Function Placement for Serverless Workflows With Layer-Wise Memory Sharing15
MoltDB: Accelerating Blockchain via Ancient State Segregation15
FEUAGame: Fairness-Aware Edge User Allocation for App Vendors15
Towards Revenue-Driven Multi-User Online Task Offloading in Edge Computing15
ProScale: Proactive Autoscaling for Microservice With Time-Varying Workload at the Edge15
An Efficient Speculative Federated Tree Learning System With a Lightweight NN-Based Predictor15
Hypergraph-Based Numerical Neural-Like P Systems for Medical Image Segmentation15
A Native Tensor–Vector Multiplication Algorithm for High Performance Computing15
Fast Post-Hoc Normalization for Brain Inspired Sparse Coding on a Neuromorphic Device14
ABSE: Adaptive Baseline Score-Based Election for Leader-Based BFT Systems14
Task Placement and Resource Allocation for Edge Machine Learning: A GNN-Based Multi-Agent Reinforcement Learning Paradigm14
Real-Time Scheduling of Parallel Task Graphs With Critical Sections Across Different Vertices14
Coordinated Batching and DVFS for DNN Inference on GPU Accelerators14
Landlord: Coordinating Dynamic Software Environments to Reduce Container Sprawl14
μBench: An Open-Source Factory of Benchmark Microservice Applications14
0.15435981750488