ACM Transactions on Architecture and Code Optimization

Papers
(The TQCC of ACM Transactions on Architecture and Code Optimization is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-05-01 to 2024-05-01.)
ArticleCitations
SMAUG33
IR2V EC32
Domain-Specific Multi-Level IR Rewriting for GPU25
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures21
Grus20
ArmorAll20
A Black-box Monitoring Approach to Measure Microservices Runtime Performance19
Compiler Support for Sparse Tensor Computations in MLIR18
LLOV17
PERI17
PAVER17
Dynamic Precision Autotuning with TAFFO16
A Case For Intra-rack Resource Disaggregation in HPC13
Inter-kernel Reuse-aware Thread Block Scheduling13
SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms13
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks12
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators12
Securing Branch Predictors with Two-Level Encryption11
Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)11
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications11
A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels11
AsynGraph10
OD-SGD10
Gem5-X10
KernelFaRer10
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM10
Exploiting Parallelism Opportunities with Deep Learning Frameworks10
EchoBay9
GEVO9
PolyDL9
An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication9
Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs8
On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond8
Performance Evaluation of Intel Optane Memory for Managed Workloads8
Schedule Synthesis for Halide Pipelines on GPUs8
GRAM8
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory8
Low-precision Logarithmic Number Systems8
Architecting Optically Controlled Phase Change Memory8
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems8
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization7
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes7
Bayesian Optimization for Efficient Accelerator Synthesis7
A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAs7
Scale-out Systolic Arrays6
Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints6
Optimizing Small-Sample Disk Fault Detection Based on LSTM-GAN Model6
ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer6
HeapCheck: Low-cost Hardware Support for Memory Safety5
Understanding Cache Compression5
FastPath_MP5
Gretch5
Cooperative Software-hardware Acceleration of K-means on a Tightly Coupled CPU-FPGA System5
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC20065
Autotuning Convolutions Is Easier Than You Think5
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration5
Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs5
LargeGraph5
SIMT-X5
E-BATCH: Energy-Efficient and High-Throughput RNN Batching5
GraphPEG5
MC-DeF5
Refresh Triggered Computation5
Energy-efficient In-Memory Address Calculation5
Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache4
GraphAttack4
On Architectural Support for Instruction Set Randomization4
Practical Software-Based Shadow Stacks on x86-644
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory4
Performance and Power Prediction for Concurrent Execution on GPUs4
MemSZ4
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks4
Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators4
SPX644
GPU Domain Specialization via Composable On-Package Architecture4
MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation4
An FPGA-based Approach to Evaluate Thermal and Resource Management Strategies of Many-core Processors4
Zeroploit4
0.03103494644165