Parallel Computing

Papers
(The median citation count of Parallel Computing is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Parallel multi-view HEVC for heterogeneously embedded cluster system96
Porting hypre to heterogeneous computer architectures: Strategies and experiences42
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format39
Editorial Board34
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT30
Enabling GPU accelerated computing in the SUNDIALS time integration library29
Integrating FPGA-based hardware acceleration with relational databases28
Computational records with aging hardware: Controlling half the output of SHA-25625
Measurement and analysis of GPU-accelerated applications with HPCToolkit24
Toward performance-portable PETSc for GPU-based exascale systems23
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers22
A parallel non-convex approximation framework for risk parity portfolio design17
Using long vector extensions for MPI reductions16
Octopus-DF: Unified DataFrame-based cross-platform data analytic system16
On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach16
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach15
Editorial Board15
Implementation and evaluation of MPI 4.0 partitioned communication libraries13
Editorial on Advances in High Performance Programming12
C-Lop: Accurate contention-based modeling of MPI concurrent communication11
Evaluating MPI resource usage summary statistics11
Adaptively parallel runtime verification based on distributed network for temporal properties10
Distributed consensus-based estimation of the leading eigenvalue of a non-negative irreducible matrix9
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning8
EESF: Energy-efficient scheduling framework for deadline-constrained workflows with computation speed estimation method in cloud8
Parallel optimization and application of unstructured sparse triangular solver on new generation of Sunway architecture8
Task graph-based performance analysis of parallel-in-time methods8
New YARN sharing GPU based on graphics memory granularity scheduling8
Editorial Board7
Editorial Board7
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers7
ParVoro++: A scalable parallel algorithm for constructing 3D Voronoi tessellations based on kd-tree decomposition7
Optimization of DNS code and visualization of entrainment and mixing phenomena at cloud edges7
Optimizing convolutional neural networks on multi-core vector accelerator6
Efficient parallel reduction of bandwidth for symmetric matrices6
The BondMachine, a moldable computer architecture6
Special issue of Selected Papers from EuroMPI/USA 20206
Editorial Board6
Editorial Board6
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS5
Using Java to create and analyze models of parallel computing systems5
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters5
Spatial- and time- division multiplexing in CNN accelerator5
GPU acceleration of Levenshtein distance computation between long strings5
Byzantine-tolerant detection of causality: There is no holy grail5
Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator5
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA4
Optimal ATAPE task scheduling on reconfigurable and partitionable hierarchical hypercube networks4
Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters4
Editorial Board4
Parallel Pattern Compiler for Automatic Global Optimizations4
Accelerating the scheduling of the network resources of the next-generation optical data centers4
Distributed software defined network-based fog to fog collaboration scheme4
A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems4
MPI collective communication through a single set of interfaces: A case for orthogonality4
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms4
A survey of software techniques to emulate heterogeneous memory systems in high-performance computing4
Editorial Board4
Analyzing the impact of CUDA versions on GPU applications4
Accelerating domain propagation: An efficient GPU-parallel algorithm over sparse matrices4
Spatial-aware data partition for distributed memory parallelization of ANN search in multimedia retrieval4
Editorial Board3
NekRS, a GPU-accelerated spectral element Navier–Stokes solver3
Reconfiguration algorithms for synchronous communication on switch based degradable arrays3
A novel hybrid heuristic-based list scheduling algorithm in heterogeneous cloud computing environment for makespan optimization3
Optimal task scheduling for partially heterogeneous systems3
Editorial Board3
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance3
A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation3
FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning3
Enable cross-iteration parallelism for PIM-based graph processing with vertex-level synchronization3
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU3
Editorial for parallel computing3
Achieving performance portability in Gaussian basis set density functional theory on accelerator based architectures in NWChemEx3
Editorial Board2
GPU algorithms for Efficient Exascale Discretizations2
An optimal scheduling algorithm considering the transactions worst-case delay for multi-channel hyperledger fabric network2
High performance sparse multifrontal solvers on modern GPUs2
Optimizing massively parallel sparse matrix computing on ARM many-core processor2
Task-parallel tiled direct solver for dense symmetric indefinite systems2
Accelerating communication for parallel programming models on GPU systems2
SGPM: A coroutine framework for transaction processing2
Benchmarking the performance of irregular computations in AutoDock-GPU molecular docking2
Metall: A persistent memory allocator for data-centric analytics2
Editorial Board2
Lowering entry barriers to developing custom simulators of distributed applications and platforms with SimGrid2
Design-time performance modeling of compositional parallel programs2
An international survey on MPI users2
Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments2
Towards scaling community detection on distributed-memory heterogeneous systems2
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems2
0.07256817817688