Parallel Computing

Papers
(The TQCC of Parallel Computing is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Parallel multi-view HEVC for heterogeneously embedded cluster system96
Porting hypre to heterogeneous computer architectures: Strategies and experiences42
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format39
Editorial Board34
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT30
Enabling GPU accelerated computing in the SUNDIALS time integration library29
Integrating FPGA-based hardware acceleration with relational databases28
Computational records with aging hardware: Controlling half the output of SHA-25625
Measurement and analysis of GPU-accelerated applications with HPCToolkit24
Toward performance-portable PETSc for GPU-based exascale systems23
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers22
A parallel non-convex approximation framework for risk parity portfolio design17
On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach16
Using long vector extensions for MPI reductions16
Octopus-DF: Unified DataFrame-based cross-platform data analytic system16
Editorial Board15
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach15
Implementation and evaluation of MPI 4.0 partitioned communication libraries13
Editorial on Advances in High Performance Programming12
Evaluating MPI resource usage summary statistics11
C-Lop: Accurate contention-based modeling of MPI concurrent communication11
Adaptively parallel runtime verification based on distributed network for temporal properties10
Distributed consensus-based estimation of the leading eigenvalue of a non-negative irreducible matrix9
New YARN sharing GPU based on graphics memory granularity scheduling8
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning8
EESF: Energy-efficient scheduling framework for deadline-constrained workflows with computation speed estimation method in cloud8
Parallel optimization and application of unstructured sparse triangular solver on new generation of Sunway architecture8
Task graph-based performance analysis of parallel-in-time methods8
ParVoro++: A scalable parallel algorithm for constructing 3D Voronoi tessellations based on kd-tree decomposition7
Optimization of DNS code and visualization of entrainment and mixing phenomena at cloud edges7
Editorial Board7
Editorial Board7
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers7
Editorial Board6
Editorial Board6
Optimizing convolutional neural networks on multi-core vector accelerator6
Efficient parallel reduction of bandwidth for symmetric matrices6
The BondMachine, a moldable computer architecture6
Special issue of Selected Papers from EuroMPI/USA 20206
Byzantine-tolerant detection of causality: There is no holy grail5
Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator5
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS5
Using Java to create and analyze models of parallel computing systems5
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters5
Spatial- and time- division multiplexing in CNN accelerator5
GPU acceleration of Levenshtein distance computation between long strings5
0.027825832366943