Parallel Computing

Papers
(The median citation count of Parallel Computing is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Parallel multi-view HEVC for heterogeneously embedded cluster system150
Editorial Board45
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format36
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers29
Integrating FPGA-based hardware acceleration with relational databases28
A parallel non-convex approximation framework for risk parity portfolio design24
Editorial Board18
LSHDP: Locally sharded heterogeneous data parallel for distributed deep learning16
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach14
Octopus-DF: Unified DataFrame-based cross-platform data analytic system12
Editorial on Advances in High Performance Programming11
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning11
Task graph-based performance analysis of parallel-in-time methods11
Adaptively parallel runtime verification based on distributed network for temporal properties11
Distributed consensus-based estimation of the leading eigenvalue of a non-negative irreducible matrix11
C-Lop: Accurate contention-based modeling of MPI concurrent communication10
EESF: Energy-efficient scheduling framework for deadline-constrained workflows with computation speed estimation method in cloud10
New YARN sharing GPU based on graphics memory granularity scheduling8
Parallel optimization and application of unstructured sparse triangular solver on new generation of Sunway architecture8
Optimizing convolutional neural networks on multi-core vector accelerator7
ParVoro++: A scalable parallel algorithm for constructing 3D Voronoi tessellations based on kd-tree decomposition7
Special issue of Selected Papers from EuroMPI/USA 20207
Editorial Board7
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers7
Using Java to create and analyze models of parallel computing systems6
Editorial Board6
Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator6
Efficient parallel reduction of bandwidth for symmetric matrices6
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS6
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters6
Byzantine-tolerant detection of causality: There is no holy grail5
A survey of software techniques to emulate heterogeneous memory systems in high-performance computing5
Spatial- and time- division multiplexing in CNN accelerator5
A sleek lock-free hash map in an ERA of safe memory reclamation methods5
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA5
GPU acceleration of Levenshtein distance computation between long strings5
Accelerating the scheduling of the network resources of the next-generation optical data centers4
Distributed software defined network-based fog to fog collaboration scheme4
Editorial Board4
Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters4
Analyzing the impact of CUDA versions on GPU applications4
Optimal ATAPE task scheduling on reconfigurable and partitionable hierarchical hypercube networks4
Editorial Board4
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms4
A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems4
Editorial Board3
FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning3
A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation3
Spatial-aware data partition for distributed memory parallelization of ANN search in multimedia retrieval3
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance3
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU3
Reconfiguration algorithms for synchronous communication on switch based degradable arrays3
LSAF: A load-balancing SpGEMM acceleration framework with dynamic package and static partition for multi-core systolic arrays3
Editorial Board3
Butterfly factorization for vision transformers on multi-IPU systems3
Editorial Board3
NekRS, a GPU-accelerated spectral element Navier–Stokes solver3
Editorial for parallel computing3
Enable cross-iteration parallelism for PIM-based graph processing with vertex-level synchronization3
Parallel Pattern Compiler for Automatic Global Optimizations3
HRPF: A parallel programming framework for recursive algorithms on heterogeneous CPU–GPU systems2
Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments2
High performance sparse multifrontal solvers on modern GPUs2
Benchmark of classical disk array and software-defined storage on near-identical hardware2
An optimal scheduling algorithm considering the transactions worst-case delay for multi-channel hyperledger fabric network2
SGPM: A coroutine framework for transaction processing2
Analysis of the impact of NUMA node configuration on the performance of offloading computations to GPUs2
Towards scaling community detection on distributed-memory heterogeneous systems2
Cache partitioning for sparse matrix–vector multiplication on the A64FX2
Accelerating communication for parallel programming models on GPU systems2
Metall: A persistent memory allocator for data-centric analytics2
Optimizing massively parallel sparse matrix computing on ARM many-core processor2
GPU/CUDA-Accelerated gradient growth optimizer for efficient complex numerical global optimization1
Multi-level parallel multi-layer block reproducible summation algorithm1
Extending the limit of LR-TDDFT on two different approaches: Numerical algorithms and new Sunway heterogeneous supercomputer1
ALBBA: An efficient ALgebraic Bypass BFS Algorithm on long vector architectures1
An evaluation of fast segmented sorting implementations on GPUs1
Editorial Board1
A dependency-aware task offloading in IoT-based edge computing system using an optimized deep learning approach1
parGeMSLR: A parallel multilevel Schur complement low-rank preconditioning and solution package for general sparse matrices1
OpenACC + Athread collaborative optimization of Silicon-Crystal application on Sunway TaihuLight1
Task-parallel tiled direct solver for dense symmetric indefinite systems1
WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel1
Fast data-dependence profiling through prior static analysis1
Editorial Board1
Seesaw: A 4096-bit vector processor for accelerating Kyber based on RISC-V ISA extensions1
FPGA-based accelerator for YOLOv5 object detection with optimized computation and data access for edge deployment1
A coarse-grained multicomputer parallel algorithm for the sequential substring constrained longest common subsequence problem1
An approach for low-power heterogeneous parallel implementation of ALC-PSO algorithm using OmpSs and CUDA1
Performance and accuracy predictions of approximation methods for shortest-path algorithms on GPUs1
Big data BPMN workflow resource optimization in the cloud1
Ginkgo—A math library designed for platform portability1
Fast calculation of isostatic compensation correction using the GPU-parallel prism method1
PROAD: Boosting Caffe Training via improving LevelDB I/O performance with Parallel Read, Out-of-Order Optimization, and Adaptive Design1
Low-synch Gram–Schmidt with delayed reorthogonalization for Krylov solvers1
Lowering entry barriers to developing custom simulators of distributed applications and platforms with SimGrid1
0.10831809043884