International Journal of High Performance Computing Applications

Papers
(The median citation count of International Journal of High Performance Computing Applications is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Visualization at exascale: Making it all work with VTK-m54
Dynamic spawning of MPI processes applied to malleability53
Enhancing scalability of a matrix-free eigensolver for studying many-body localization49
Running ahead of evolution—AI-based simulation for predicting future high-risk SARS-CoV-2 variants39
Automatizing the creation of specialized high-performance computing containers35
Compressed basis GMRES on high-performance graphics processing units28
Accelerating atmospheric physics parameterizations using graphics processing units26
Large-scale ab initio simulation of light–matter interaction at the atomic scale in Fugaku20
Refining HPCToolkit for application performance analysis at exascale16
HPC I/O innovations in the exascale era14
Modeling, evaluating, and orchestrating heterogeneous environmental leverages for large-scale data center management14
Orchestration of materials science workflows for heterogeneous resources at large scale14
HDF5 in the exascale era: Delivering efficient and scalable parallel I/O for exascale applications13
Julia versus C++ Kokkos for performance portable Cartesian CFD solvers on heterogeneous architectures12
Ginkgo - A math library designed to accelerate Exascale Computing Project science applications10
A tale of two codes: CUDA vs OpenACC for mass-zero constrained dynamics9
General framework for re-assuring numerical reliability in parallel Krylov solvers: A case of bi-conjugate gradient stabilized methods9
Preparing MPICH for exascale9
An elastic framework for ensemble-based large-scale data assimilation8
Hypergraph-based locality-enhancing methods for graph operations in Big Data applications7
Integrating ytopt and libEnsemble to autotune OpenMC7
Special issue introduction7
Performance of explicit and IMEX MRI multirate methods on complex reactive flow problems within modern parallel adaptive structured grid frameworks7
GPU-based molecular dynamics of fluid flows: Reaching for turbulence7
Retraction Notice7
Accelerated dynamic data reduction using spatial and temporal properties6
Massively parallel nodal discontinous Galerkin finite element method simulator for room acoustics6
PeleC: An adaptive mesh refinement solver for compressible reacting flows5
Cache blocking of distributed-memory parallel matrix power kernels5
Special issue: Introduction5
Preparing the TAU performance system for exascale and beyond5
A study on the performance of distributed training of data-driven CFD simulations5
Fast truncated SVD of sparse and dense matrices on graphics processors5
Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications5
Bricks: A high-performance portability layer for computations on block-structured grids4
An integrated three-dimensional aeromechanical analysis for the prediction of stresses on modern coaxial rotors4
Understanding power and energy utilization in large scale production physics simulation codes4
Clacc: OpenACC for C/C++ in Clang4
Semi-Lagrangian 4d, 5d, and 6d kinetic plasma simulation on large-scale GPU-equipped supercomputers4
Feynman and computation: From Los Alamos to quantum computers4
A population data-driven workflow for COVID-19 modeling and learning4
Data-driven scalable pipeline using national agent-based models for real-time pandemic response and decision support4
Accelerating cluster dynamics simulation of fission gas behavior in nuclear fuel on deep computing unit–based heterogeneous architecture supercomputer4
Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials4
Advances in ArborX to support exascale applications4
NUMA-aware parallel sparse LU factorization for SPICE-based circuit simulators on ARM multi-core processors4
Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors4
UMap: An application-oriented user level memory mapping library3
MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures3
PaRSEC: Scalability, flexibility, and hybrid architecture support for task-based applications in ECP3
Performance analysis of relaxation Runge–Kutta methods3
Exploiting mesh structure to improve multigrid performance for saddle-point problems3
Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations3
Cache-optimized and low-overhead implementations of additive Schwarz methods for high-order FEM multigrid computations3
IO-aware Job-Scheduling: Exploiting the Impacts of Workload Characterizations to select the Mapping Strategy3
Guest editors note: Special issue on clusters, clouds, and data for scientific computing3
Simulation-based machine learning for real-time assessment of side-branch hemodynamics in coronary bifurcation lesions2
Efficient solution of batched band linear systems on GPUs2
End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizations2
Performance portability in a real world application: PHAST applied to Caffe2
High performance computing seismic redatuming by inversion with algebraic compression and multiple precisions2
The ECP ALPINE project: In situ and post hoc visualization infrastructure and analysis capabilities for exascale2
SWARM: Reimagining scientific workflow management systems in a distributed world2
Mixed precision LU factorization on GPU tensor cores: reducing data movement and memory footprint2
Role-shifting threads: Increasing OpenMP malleability to address load imbalance at MPI and OpenMP2
A fine-grained parallelization of the immersed boundary method2
PETSc/TAO developments for GPU-based early exascale systems2
Evolution of the SLATE linear algebra library2
Black-box statistical prediction of lossy compression ratios for scientific data2
Detecting interference between applications and improving the scheduling using malleable application clones2
Corrigendum to large-scale direct numerical simulations of turbulence using GPUs and modern Fortran2
#COVIDisAirborne: AI-enabled multiscale computational microscopy of delta SARS-CoV-2 in a respiratory aerosol2
Task-parallel in situ temporal compression of large-scale computational fluid dynamics data2
An implicit barotropic mode solver for MPAS-ocean using a modern Fortran solver interface2
ECP libraries and tools: An overview2
A compilation-based approach to performant reduction and redistribution collective communication algorithms1
Batched sparse direct solver design and evaluation in SuperLU_DIST1
Checkpointing fine-tuning for accelerating seismic applications in GPUs1
Parthenon—a performance portable block-structured adaptive mesh refinement framework1
Finding the forest in the trees: Enabling performance optimization on heterogeneous architectures through data science analysis of ensemble performance data1
Performance comparison of the A-grid and C-grid shallow-water models on icosahedral grids1
Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs1
Resiliency in numerical algorithm design for extreme scale simulations1
Towards exascale for wind energy simulations1
Breaking the exascale barrier for the electronic structure problem in ab-initio molecular dynamics1
Myths and legends in high-performance computing1
A two-level GPU-accelerated incomplete LU preconditioner for general sparse linear systems1
Result-scalability: Following the evolution of selected social impact of HPC1
HipBone: A performance-portable graphics processing unit-accelerated C++ version of the NekBone benchmark1
Experience and analysis of scalable high-fidelity computational fluid dynamics on modular supercomputing architectures1
ZFP: A compressed array representation for numerical computations1
Exploiting temporal data reuse and asynchrony in the reverse time migration1
Performance enhancement of the Ozaki Scheme on integer matrix multiplication unit1
Modeling and implementing an earthquake and tsunami event-triggered, time-constrained impact assessment workflow1
Portable, heterogeneous ensemble workflows at scale using libEnsemble1
Efficiency and scalability of fully-resolved fluid-particle simulations on heterogeneous CPU-GPU architectures1
Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload1
0.17139315605164