Parallel Computing

Papers
(The median citation count of Parallel Computing is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-07-01 to 2025-07-01.)
ArticleCitations
Parallel multi-view HEVC for heterogeneously embedded cluster system88
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT41
Porting hypre to heterogeneous computer architectures: Strategies and experiences35
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format32
Editorial Board29
A scalable algorithm for the optimization of neural network architectures29
A parallel non-convex approximation framework for risk parity portfolio design26
Enabling GPU accelerated computing in the SUNDIALS time integration library25
Integrating FPGA-based hardware acceleration with relational databases23
Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers23
Measurement and analysis of GPU-accelerated applications with HPCToolkit22
Computational records with aging hardware: Controlling half the output of SHA-25616
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers16
Using long vector extensions for MPI reductions15
On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach15
Octopus-DF: Unified DataFrame-based cross-platform data analytic system14
Toward performance-portable PETSc for GPU-based exascale systems14
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach13
Editorial Board11
Implementation and evaluation of MPI 4.0 partitioned communication libraries10
Editorial on Advances in High Performance Programming10
Evaluating MPI resource usage summary statistics9
Editorial Board9
A computational-graph partitioning method for training memory-constrained DNNs8
C-Lop: Accurate contention-based modeling of MPI concurrent communication8
Adaptively parallel runtime verification based on distributed network for temporal properties8
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning7
New YARN sharing GPU based on graphics memory granularity scheduling7
EESF: Energy-efficient scheduling framework for deadline-constrained workflows with computation speed estimation method in cloud7
HySet: A hybrid framework for exact set similarity join using a GPU7
Distributed consensus-based estimation of the leading eigenvalue of a non-negative irreducible matrix7
Task graph-based performance analysis of parallel-in-time methods7
Optimization of DNS code and visualization of entrainment and mixing phenomena at cloud edges6
Editorial Board6
ParVoro++: A scalable parallel algorithm for constructing 3D Voronoi tessellations based on kd-tree decomposition6
Parallel optimization and application of unstructured sparse triangular solver on new generation of Sunway architecture6
Editorial Board6
Editorial Board5
The BondMachine, a moldable computer architecture5
Special issue of Selected Papers from EuroMPI/USA 20205
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers5
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters5
Efficient parallel reduction of bandwidth for symmetric matrices5
Optimizing convolutional neural networks on multi-core vector accelerator5
Editorial Board5
Distributed software defined network-based fog to fog collaboration scheme4
Accelerating the scheduling of the network resources of the next-generation optical data centers4
A survey of software techniques to emulate heterogeneous memory systems in high-performance computing4
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS4
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms4
Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters4
A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems4
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA4
Analyzing the impact of CUDA versions on GPU applications4
Byzantine-tolerant detection of causality: There is no holy grail4
Editorial Board4
Optimal ATAPE task scheduling on reconfigurable and partitionable hierarchical hypercube networks4
Accelerating domain propagation: An efficient GPU-parallel algorithm over sparse matrices4
Spatial- and time- division multiplexing in CNN accelerator4
GPU acceleration of Levenshtein distance computation between long strings4
Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator4
Editorial Board4
Parallel Pattern Compiler for Automatic Global Optimizations3
Improving the I/O of large geophysical models using PnetCDF and BeeGFS3
Achieving performance portability in Gaussian basis set density functional theory on accelerator based architectures in NWChemEx3
Spatial-aware data partition for distributed memory parallelization of ANN search in multimedia retrieval3
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance3
Optimal task scheduling for partially heterogeneous systems3
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU3
FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning3
MPI collective communication through a single set of interfaces: A case for orthogonality3
Editorial Board3
NekRS, a GPU-accelerated spectral element Navier–Stokes solver3
Editorial Board3
Lowering entry barriers to developing custom simulators of distributed applications and platforms with SimGrid2
An optimal scheduling algorithm considering the transactions worst-case delay for multi-channel hyperledger fabric network2
Towards scaling community detection on distributed-memory heterogeneous systems2
High performance sparse multifrontal solvers on modern GPUs2
Editorial for parallel computing2
Optimizing massively parallel sparse matrix computing on ARM many-core processor2
GPU algorithms for Efficient Exascale Discretizations2
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems2
Benchmarking the performance of irregular computations in AutoDock-GPU molecular docking2
SGPM: A coroutine framework for transaction processing2
A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation2
A novel hybrid heuristic-based list scheduling algorithm in heterogeneous cloud computing environment for makespan optimization2
Design-time performance modeling of compositional parallel programs2
Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments2
Accelerating communication for parallel programming models on GPU systems2
Editorial Board2
Metall: A persistent memory allocator for data-centric analytics2
Reconfiguration algorithms for synchronous communication on switch based degradable arrays2
An international survey on MPI users2
0.04739785194397