IEEE Transactions on Parallel and Distributed Systems

Papers
(The TQCC of IEEE Transactions on Parallel and Distributed Systems is 13. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Internet Traffic Privacy Enhancement with Masking: Optimization and Trade-Offs291
2020 Reviewers List241
Guest Editorial: Special Section on SC19 Student Cluster Competition212
Spread+: Scalable Model Aggregation in Federated Learning With Non-IID Data195
SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services190
HiFlash: Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association155
Dynamic Controller/Switch Mapping: A Service Oriented Assignment Approach148
CloudSentry: Two-Stage Heavy Hitter Detection for Cloud-Scale Gateway Overload Protection142
Communication Optimization Algorithms for Distributed Deep Learning Systems: A Survey127
Chasing Common Knowledge: Joint Large Model Selection and Pulling in MEC With Parameter Sharing123
Floating Point Calculation of the Cube Function on FPGAs123
ViTeGNN: Towards Versatile Inference of Temporal Graph Neural Networks on FPGA111
GPABE: GPU-Based Parallelization Framework for Attribute-Based Encryption Schemes101
GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds100
A Parallel Jacobi-Embedded Gauss-Seidel Method96
eHotSnap: An Efficient and Hot Distributed Snapshots System for Virtual Machine Cluster92
FedTune-SGM: A Stackelberg-Driven Personalized Federated Learning Strategy for Edge Networks88
Federated Learning With Nesterov Accelerated Gradient80
2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 3280
SketchINT: Empowering INT With TowerSketch for Per-Flow Per-Switch Measurement79
Accurate Differentially Private Deep Learning on the Edge78
A Memory-Constraint-Aware List Scheduling Algorithm for Memory-Constraint Heterogeneous Muti-Processor System77
Faber: A Hardware/SoftWare Toolchain for Image Registration77
Availability-Aware Revenue-Effective Application Deployment in Multi-Access Edge Computing77
VeriML: Enabling Integrity Assurances and Fair Payments for Machine Learning as a Service76
FAST: Enhancing Federated Learning Through Adaptive Data Sampling and Local Training75
OptZConfig: Efficient Parallel Optimization of Lossy Compression Configuration73
Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum73
Batch Crowdsourcing for Complex Tasks Based on Distributed Team Formation in E-Markets70
Jdebug: A Fast, Non-Intrusive and Scalable Fault Locating Tool for Ten-Million-Scale Parallel Applications69
Communicational and Computational Efficient Federated Domain Adaptation67
Design and Implementation of 2D Convolution on x86/x64 Processors65
QoS-Aware Scheduling of Remote Rendering for Interactive Multimedia Applications in Edge Computing65
Online Orchestration of Collaborative Caching for Multi-Bitrate Videos in Edge Computing64
TriangleKV: Reducing Write Stalls and Write Amplification in LSM-Tree Based KV Stores With Triangle Container in NVM61
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From ShanghaiTech University60
SconeKV: A Scalable, Strongly Consistent Key-Value Store60
Enabling Large Scale Simulations for Particle Accelerators60
Theoretical Analysis of an Adaptive Periodic Multi Installment Scheduling With Result Retrieval for SAR Image Processing60
Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments58
PayDebt: Reduce Buffer Occupancy Under Bursty Traffic on Large Clusters58
A Novel Compute-Efficient Tridiagonal Solver for Many-Core Architectures58
Enabling In-Network Floating-Point Arithmetic for Efficient Computation Offloading56
Replicated Versioned Data Structures for Wide-Area Distributed Systems56
Sentinels and Twins: Effective Integrity Assessment for Distributed Computation55
Blockchain-Based P2P Content Delivery With Monetary Incentivization and Fairness Guarantee55
mpi4py.futures: MPI-Based Asynchronous Task Execution for Python54
DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious Multicore53
A Comprehensive Performance Model of Sparse Matrix-Vector Multiplication to Guide Kernel Optimization53
High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks52
The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel Systems52
Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation51
Improving the Efficiency of Deadlock Detection in MPI Programs Through Trace Compression50
Towards Correlated Data Trading for High-Dimensional Private Data50
Editorial48
SEIZE: Runtime Inspection for Parallel Dataflow Systems47
On the Message Complexity of Fault-Tolerant Computation: Leader Election and Agreement46
SmartTuning: Selecting Hyper-Parameters of a ConvNet System for Fast Training and Small Working Memory46
Synapse Compression for Event-Based Convolutional-Neural-Network Accelerators46
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From Peking University44
Improving Effectiveness of Simulation-Based Inference in the Massively Parallel Regime44
LAS: Locality-Aware Scheduling for GEMM-Accelerated Convolutions in GPUs43
A General Approach to Generate Test Packets With Network Configurations43
Enabling Balanced Data Deduplication in Mobile Edge Computing43
Scheduling Algorithms for Federated Learning With Minimal Energy Consumption43
TurboMGNN: Improving Concurrent GNN Training Tasks on GPU With Fine-Grained Kernel Fusion43
STR: Hybrid Tensor Re-Generation to Break Memory Wall for DNN Training42
Exploring Fine-Grained In-Memory Database Performance for Modern CPUs41
Experimental Survey of FPGA-Based Monolithic Switches and a Novel Queue Balancer41
AdaptChain: Adaptive Scaling Blockchain With Transaction Deduplication41
A Multi-GPU Aggregation-Based AMG Preconditioner for Iterative Linear Solvers40
Falcon: Fair and Efficient Online File Transfer Optimization40
Energy-Aware, Device-to-Device Assisted Federated Learning in Edge Computing40
IO-Sets: Simple and Efficient Approaches for I/O Bandwidth Management39
Accelerating Content-Defined Chunking for Data Deduplication Based on Speculative Jump39
RLPTO: A Reinforcement Learning-Based Performance-Time Optimized Task and Resource Scheduling Mechanism for Distributed Machine Learning39
APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators39
On-Line Network Traffic Anomaly Detection Based on Tensor Sketch39
US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning38
Online Learning Algorithms for Context-Aware Video Caching in D2D Edge Networks36
A Point Cloud Video Recognition Acceleration Framework Based on Tempo-Spatial Information36
EiC Editorial – Advancing Reproducibility in Parallel and Distributed Systems Research35
PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation35
Design and Implementation of Deep Learning 2D Convolutions on Modern CPUs35
GraphOpt: Constrained-Optimization-Based Parallelization of Irregular Graphs34
NITI: Training Integer Neural Networks Using Integer-Only Arithmetic33
AsyncFedGAN: An Efficient and Staleness-Aware Asynchronous Federated Learning Framework for Generative Adversarial Networks32
Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods31
Efficient Virtual Network Embedding of Cloud-Based Data Center Networks into Optical Networks31
Retargeting Tensor Accelerators for Epistasis Detection31
Joint SFC Deployment and Resource Management in Heterogeneous Edge for Latency Minimization31
Endurance-Aware Mapping of Spiking Neural Networks to Neuromorphic Hardware31
Enabling Scalable and Extensible Memory-Mapped Datastores in Userspace30
A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures30
libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations30
Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution30
VPIC 2.0: Next Generation Particle-in-Cell Simulations30
Efficient and Accurate Flow Record Collection With HashFlow30
DIESEL+: Accelerating Distributed Deep Learning Tasks on Image Datasets29
DS-ADMM++: A Novel Distributed Quantized ADMM to Speed up Differentially Private Matrix Factorization29
Online Reconfiguration of IoT Applications in the Fog: The Information-Coordination Trade-Off29
LoomIO: Object-Level Coordination in Distributed File Systems28
Pistis: Issuing Trusted and Authorized Certificates With Distributed Ledger and TEE28
TridentKV: A Read-Optimized LSM-Tree Based KV Store via Adaptive Indexing and Space-Efficient Partitioning28
Reproducibility: Performance Evaluation of MemXCT on Azure CycleCloud Platform28
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Tsinghua University28
Scaling Poisson Solvers on Many Cores via MMEwald28
SPRINT: A High-Performance, Energy-Efficient, and Scalable Chiplet-Based Accelerator With Photonic Interconnects for CNN Inference27
SelectiveEC: Towards Balanced Recovery Load on Erasure-Coded Storage Systems27
CSEdge: Enabling Collaborative Edge Storage for Multi-Access Edge Computing Based on Blockchain27
LOFS: A Lightweight Online File Storage Strategy for Effective Data Deduplication at Network Edge27
Auto-GNAS: A Parallel Graph Neural Architecture Search Framework26
Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores26
Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-Alignment26
DMA-Assisted I/O for Persistent Memory26
Building High-throughput Neural Architecture Search Workflows via a Decoupled Fitness Prediction Engine26
EdgeTB: A Hybrid Testbed for Distributed Machine Learning at the Edge With High Fidelity26
TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations26
KLNK: Expanding Page Boundaries in a Distributed Shared Memory System25
A 3D Hybrid Optical-Electrical NoC Using Novel Mapping Strategy Based DCNN Dataflow Acceleration25
An Efficient Bottleneck Planes Exclusion Method for Reconfiguring 3D VLSI Arrays25
Swift: Expedited Failure Recovery for Large-Scale DNN Training25
AdaptChain: Adaptive Data Sharing and Synchronization for NFV Systems on Heterogeneous Architectures25
HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations25
STT-RAM-Based Hierarchical in-Memory Computing25
CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs25
Sampling-Based Multi-Job Placement for Heterogeneous Deep Learning Clusters25
RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy Interface24
GeoDeploy: Geo-Distributed Application Deployment Using Benchmarking24
IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous Computing24
Cost-Effective and Robust Service Provisioning in Multi-Access Edge Computing24
Gamora: Learning-Based Buffer-Aware Preloading for Adaptive Short Video Streaming24
Algorithms for Data Sharing-Aware Task Allocation in Edge Computing Systems23
H5Intent: Autotuning HDF5 With User Intent23
Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications23
VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision Tasks23
Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach22
FedProf: Selective Federated Learning based on Distributional Representation Profiling22
Federated Ensemble Model-Based Reinforcement Learning in Edge Computing22
TrieKV: A High-Performance Key-Value Store Design With Memory as Its First-Class Citizen22
Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs Over Heterogeneous Infrastructure22
Securing Distributed SGD Against Gradient Leakage Threats22
Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis22
TOP: Task-Based Operator Parallelism for Asynchronous Deep Learning Inference on GPU22
Efficient Forwarding Anomaly Detection in Software-Defined Networks22
CAMIG: Concurrency-Aware Live Migration Management of Multiple Virtual Machines in SDN-Enabled Clouds21
An Efficient Parallel Secure Machine Learning Framework on GPUs21
Multi-Agent Deep Reinforcement Learning Framework for Renewable Energy-Aware Workflow Scheduling on Distributed Cloud Data Centers21
Min-Max Cost Optimization for Efficient Hierarchical Federated Learning in Wireless Edge Networks21
Votes-as-a-Proof (VaaP): Permissioned Blockchain Consensus Protocol Made Simple21
Mechanisms for Resource Allocation and Pricing in Mobile Edge Computing Systems21
Incentive Mechanism Design for Joint Resource Allocation in Blockchain-Based Federated Learning20
Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters20
CNNPC: End-Edge-Cloud Collaborative CNN Inference With Joint Model Partition and Compression20
Towards Efficient and Stable K-Asynchronous Federated Learning With Unbounded Stale Gradients on Non-IID Data20
Multi-Job Intelligent Scheduling With Cross-Device Federated Learning20
RHDOFS: A Distributed Online Algorithm Towards Scalable Streaming Feature Selection20
DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters20
IPPTS: An Efficient Algorithm for Scientific Workflow Scheduling in Heterogeneous Computing Systems20
Adaptive Vertical Federated Learning on Unbalanced Features19
A Comparative Study of Sampling Methods With Cross-Validation in the FedHome Framework19
Topology-Aware Neural Model for Highly Accurate QoS Prediction19
Personalized Edge Intelligence via Federated Self-Knowledge Distillation19
Offloading Tasks With Dependency and Service Caching in Mobile Edge Computing19
Fine-Grained Multi-Query Stream Processing on Integrated Architectures19
Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization19
Auction-Based Cluster Federated Learning in Mobile Edge Computing Systems19
Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading19
Real Relative Encoding Genetic Algorithm for Workflow Scheduling in Heterogeneous Distributed Computing Systems19
Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms18
CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory18
An Unequal Caching Strategy for Shared-Memory Graph Analytics18
Astrea:Auto-Serverless Analytics Towards Cost-Efficiency and QoS-Awareness18
Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management18
A Game-Based Approach for Cost-Aware Task Assignment With QoS Constraint in Collaborative Edge and Cloud Environments18
Alleviating the Impact of Abnormal Events Through Multi-Constrained VM Placement18
Reliability-Aware Multi-Objective Memetic Algorithm for Workflow Scheduling Problem in Multi-Cloud System18
A Decentralized Federated Learning Framework via Committee Mechanism With Convergence Guarantee18
Data-Centric Client Selection for Federated Learning Over Distributed Edge Networks18
Joint Task Scheduling and Containerizing for Efficient Edge Computing18
A Quantum Approach Towards the Adaptive Prediction of Cloud Workloads18
Leveraging Deep Reinforcement Learning With Attention Mechanism for Virtual Network Function Placement and Routing17
IEEE Special Issue on Innovative R&D Toward the Exascale Era17
Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter Scheduling17
Coflow Scheduling in Data Centers: Routing and Bandwidth Allocation17
Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning17
IEEE Quantum Week Register 202317
Elastic and Reliable Bandwidth Reservation Based on Distributed Traffic Monitoring and Control17
FHE4DMM: A Low-Latency Distributed Matrix Multiplication With Fully Homomorphic Encryption17
Energy-Efficient Cache-Aware Scheduling on Heterogeneous Multicore Systems17
FRuDA: Framework for Distributed Adversarial Domain Adaptation17
Accelerating Distributed GNN Training by Codes17
COOPER-SCHED: A Cooperative Scheduling Framework for Mobile Edge Computing with Expected Deadline Guarantee17
PS+: A Simple yet Effective Framework for Fast Training on Parameter Server16
Architectural Adaptation and Performance-Energy Optimization for CFD Application on AMD EPYC Rome16
PhaST: Hierarchical Concurrent Log-Free Skip List for Persistent Memory16
AccTFM: An Effective Intra-Layer Model Parallelization Strategy for Training Large-Scale Transformer-Based Models16
PushBox: Making Use of Every Bit of Time to Accelerate Completion of Data-Parallel Jobs16
A Bifactor Approximation Algorithm for Cloudlet Placement in Edge Computing16
Level-Based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication16
Accelerating Tensor Swapping in GPUs With Self-Tuning Compression16
Redesigning and Optimizing UCSF DOCK3.7 on Sunway TaihuLight16
Max-Tree Computation on GPUs16
Busy-Time Scheduling on Heterogeneous Machines: Algorithms and Analysis16
BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription16
swMPAS-A: Scaling MPAS-A to 39 Million Heterogeneous Cores on the New Generation Sunway Supercomputer15
Parallel Dynamic Sparse Approximate Inverse Preconditioning Algorithm on GPU15
Loop-the-Loops: Fragmented Learning Over Networks for Constrained IoT Devices15
A relative coordinate based distributed sparse-preserving matrix factorization approach towards self-stabilizing network location service - Withdrawn15
Building Trust in Earth Science Findings through Data Traceability and Results Explainability15
HierFedML: Aggregator Placement and UE Assignment for Hierarchical Federated Learning in Mobile Edge Computing15
Tag-Sharer-Fusion Directory: A Scalable Coherence Directory With Flexible Entry Formats15
AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems15
Securing Fine-Grained Data Sharing and Erasure in Outsourced Storage Systems15
Privacy Preserving n-Party Scalar Product Protocol15
Bio-ESMD: A Data Centric Implementation for Large-Scale Biological System Simulation on Sunway TaihuLight Supercomputer15
Resettable Encoded Vector Clock for Causality Analysis With an Application to Dynamic Race Detection15
Outperforming Sequential Full-Word Long Addition With Parallelization and Vectorization15
Near-Lossless MPI Tracing and Proxy Application Autogeneration15
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From Peking University15
Neighbor Graph Based Tensor Recovery For Accurate Internet Anomaly Detection15
Topology-Aware Scheduling Framework for Microservice Applications in Cloud15
FedMDS: An Efficient Model Discrepancy-Aware Semi-Asynchronous Clustered Federated Learning Framework15
Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System14
FLUPS - A Flexible and Performant Massively Parallel Fourier Transform Library14
Partitioning-Based Scheduling of OpenMP Task Systems With Tied Tasks14
Privacy-Preserving Similarity Search With Efficient Updates in Distributed Key-Value Stores14
Hone: Mitigating Stragglers in Distributed Stream Processing With Tuple Scheduling14
Expediting Distributed DNN Training With Device Topology-Aware Graph Deployment14
A Case for Pricing Bandwidth: Sharing Datacenter Networks With Cost Dominant Fairness14
Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility14
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From University of Washington14
Shuffle Differential Private Data Aggregation for Random Population14
Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms14
Preventive Priority Setting Against Multiple Controller Failures in Software Defined Networks13
Accelerating Data Delivery of Latency-Sensitive Applications in Container Overlay Network13
Frequency-Domain Inference Acceleration for Convolutional Neural Networks Using ReRAMs13
TDTA: Topology-Based Real-Time DAG Task Allocation on Identical Multiprocessor Platforms13
A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration13
High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms13
Joint Caching and Routing in Cache Networks With Arbitrary Topology13
The Doctrine of MEAN: Realizing Deduplication Storage at Unreliable Edge13
GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering13
OfpCNN: On-Demand Fine-Grained Partitioning for CNN Inference Acceleration in Heterogeneous Devices13
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency13
Enabling Efficient Random Access to Hierarchically Compressed Text Data on Diverse GPU Platforms13
Back to Homogeneous Computing: A Tightly-Coupled Neuromorphic Processor With Neuromorphic ISA13
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From National Tsing Hua University13
Consistent Low Latency Scheduler for Distributed Key-Value Stores13
Understanding the Impact of Arbitration in MZI-Based Beneš Switching Fabrics13
0.24145293235779