International Journal of Parallel Programming

Papers
(The median citation count of International Journal of Parallel Programming is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Accelerating OCaml Programs on FPGA12
Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU12
Special Issue on SAMOS 20228
Guest Editorial: Special Issue on 2020 IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2020)7
Meerkat: A Framework for Dynamic Graph Algorithms on GPUs7
Erasure-Coded Hybrid Writes Based on Data Delta7
SGgraph: A Scalable GPU-Based Edge-Centric Graph Processing Framework5
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation5
Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design5
Scaling the Maximum Flow Computation on GPUs5
A Scalable Similarity Join Algorithm Based on MapReduce and LSH4
Declarative Data Flow in a Graph-Based Distributed Memory Runtime System4
Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)4
A Practical Approach for Employing Tensor Train Decomposition in Edge Devices4
Using Machine Learning Hardware to Solve Linear Partial Differential Equations with Finite Difference Methods3
Generic Exact Combinatorial Search at HPC Scale3
Advancing Interactive Parallelization: iCetus3
Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments3
K*-Means: An Efficient Clustering Algorithm with Adaptive Decision Boundaries3
Design and Performance Evaluation of a Novel High-Speed Hardware Architecture for Keccak Crypto Coprocessor3
Automatic Heterogeneous Runtime Using Signal Processing Domain-Specific and Parallel Patterns3
Fine-Grained Power Modeling of Multicore Processors Using FFNNs2
RMOWOA: A Revamped Multi-Objective Whale Optimization Algorithm for Maximizing the Lifetime of a Network in Wireless Sensor Networks2
Generating Sparse Matrices for Large-Scale Spectral Clustering on a Single GPU2
Larger-Than-Memory Stateful Stream Processing with WindFlow2
A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads2
Programming Parallelism on FPGAs with Eclat2
Self-Adaptive Micro-Batching for Low-Latency GPU-Accelerated Stream Processing2
Accelerating Computation of Steiner Trees on GPUs1
The Celerity High-level API: C++20 for Accelerator Clusters1
Retraction Note: QoS and QoE Enhanced Resource Allocation for Wireless Video Sensor Networks Using Hybrid Optimization Algorithm1
Giraph-Based Distributed Algorithms for Coloring Large-Scale Graphs1
A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC1
SMSG: Profiling-Free Parallelism Modeling for Distributed Training of DNN1
Yet Another Lock-Free Atom Table Design for Scalable Symbol Management in Prolog1
CAPIO-CL: The CAPIO Coordination Language1
Thread and Data Mapping in Software Transactional Memory: an Overview1
High-Level Programming of FPGA-Accelerated Systems with Parallel Patterns1
Intelligent Page Migration on Heterogeneous Memory by Using Transformer1
AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators1
A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems0
DSParLib: A C++ Template Library for Distributed Stream Parallelism0
Fast Parallel CPU-GPU Approximate Spectral Clustering for Transcriptomics Data0
A Methodology for Efficient Tile Size Selection for Affine Loop Kernels0
GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution0
DyG-DPCD: A Distributed Parallel Community Detection Algorithm for Large-Scale Dynamic Graphs0
Simulation-Based Parameter Optimization for Self-adaptive HPL on Parallel Systems0
DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses0
pi-par: A Dependently-Typed Parallel Language with Algorithmic Skeletons0
Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems0
Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks0
Parallelization of Swarm Intelligence Algorithms: Literature Review0
Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs0
An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs*0
Retraction Note: Designing a Framework for Communal Software: Based on the Assessment Using Relation Modelling0
Split’n’Cover: ISO 26262 Hardware Safety Analysis with SystemC0
Analysis of Model Parallelism for AI Applications on a 64-core RV64 Server CPU0
PragFormer: Data-Driven Parallel Source Code Classification with Transformers0
Efficient High-Level Programming in Plain Java0
Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments0
Correction: Split’n’Cover: ISO 26,262 Hardware Safety Analysis with SystemC0
Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks0
A Hybrid Machine Learning Model for Code Optimization0
Energy-Efficient Partial-Duplication Task Mapping Under Multiple DVFS Schemes0
Distributed-Memory FastFlow Building Blocks0
FIPLib: An Image Processing Library for FPGAs Using High-Level Synthesis0
Celerity-RSim: Porting Light Propagation Simulation to Accelerator Clusters Using a High-Level API0
A High-Level API for Dynamic Load Balancing in Large-Scale Parameter Sweeps0
LSH SimilarityJoin Pattern in FastFlow0
Code Rejuvenation: From Vector Compiler Intrinsics to Portable Standardized SIMD0
Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method0
High Throughput Instruction-Data Level Parallelism Based Arithmetic Hardware Accelerator0
Parallelizing RNA-Seq Analysis with BioSkel: A FastFlow Based Prototype0
Interruptible Nodes: Reducing Queueing Costs in Irregular Streaming Dataflow Applications on Wide-SIMD Architectures0
Performance Characterization of Python Runtimes for Multi-device Task Parallel Programming0
Stencil Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments0
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems0
Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics0
GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis0
Guest Editor’s Note: High-Level Parallel Programming 20210
0.037290811538696