Parallel Computing

Papers
(The TQCC of Parallel Computing is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Parallel multi-view HEVC for heterogeneously embedded cluster system78
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format37
Porting hypre to heterogeneous computer architectures: Strategies and experiences32
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT31
Computational records with aging hardware: Controlling half the output of SHA-25626
A scalable algorithm for the optimization of neural network architectures26
On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach25
A parallel non-convex approximation framework for risk parity portfolio design25
Using long vector extensions for MPI reductions23
Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers22
Enabling GPU accelerated computing in the SUNDIALS time integration library22
Block red–black MILU(0) preconditioner with relaxation on GPU21
Integrating FPGA-based hardware acceleration with relational databases16
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers15
Measurement and analysis of GPU-accelerated applications with HPCToolkit14
Implementation and evaluation of MPI 4.0 partitioned communication libraries13
Toward performance-portable PETSc for GPU-based exascale systems13
Octopus-DF: Unified DataFrame-based cross-platform data analytic system13
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach11
Editorial Board10
Adaptively parallel runtime verification based on distributed network for temporal properties10
Editorial on Advances in High Performance Programming9
Task graph-based performance analysis of parallel-in-time methods8
A computational-graph partitioning method for training memory-constrained DNNs8
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning8
C-Lop: Accurate contention-based modeling of MPI concurrent communication8
Editorial Board8
Evaluating MPI resource usage summary statistics8
New YARN sharing GPU based on graphics memory granularity scheduling7
HySet: A hybrid framework for exact set similarity join using a GPU7
Distributed consensus-based estimation of the leading eigenvalue of a non-negative irreducible matrix7
OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA7
Optimization of DNS code and visualization of entrainment and mixing phenomena at cloud edges6
Editorial Board6
Parallel optimization and application of unstructured sparse triangular solver on new generation of Sunway architecture6
Editorial Board6
Efficient parallel reduction of bandwidth for symmetric matrices5
Editorial Board5
Special issue of Selected Papers from EuroMPI/USA 20205
ParVoro++: A scalable parallel algorithm for constructing 3D Voronoi tessellations based on kd-tree decomposition5
Optimizing convolutional neural networks on multi-core vector accelerator5
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers5
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS5
Spatial- and time- division multiplexing in CNN accelerator4
Distributed software defined network-based fog to fog collaboration scheme4
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms4
Accelerating domain propagation: An efficient GPU-parallel algorithm over sparse matrices4
Editorial Board4
GPU acceleration of Levenshtein distance computation between long strings4
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA4
Optimal ATAPE task scheduling on reconfigurable and partitionable hierarchical hypercube networks4
Accelerating the scheduling of the network resources of the next-generation optical data centers4
Guest editorial: Virtual special issue on parallel matrix algorithms and applications (PMAA’18)4
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters4
Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator4
A survey of software techniques to emulate heterogeneous memory systems in high-performance computing4
Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters4
Editorial Board4
Analyzing the impact of CUDA versions on GPU applications4
The BondMachine, a moldable computer architecture4
0.050010204315186