ACM Transactions on Architecture and Code Optimization

Papers
(The TQCC of ACM Transactions on Architecture and Code Optimization is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)
ArticleCitations
ASM: An Adaptive Secure Multicore for Co-located Mutually Distrusting Processes38
Spiking Neural Networks in Spintronic Computational RAM37
Object Intersection Captures on Interactive Apps to Drive a Crowd-sourced Replay-based Compiler Optimization35
An Intelligent Scheduling Approach on Mobile OS for Optimizing UI Smoothness and Power26
TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency23
Highly Efficient Self-checking Matrix Multiplication on Tiled AMX Accelerators21
ModNEF : An Open Source Modular Neuromorphic Emulator for FPGA for Low-Power In-Edge Artificial Intelligence20
TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework19
An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication19
Performance, Energy and NVM Lifetime-Aware Data Structure Refinement and Placement for Heterogeneous Memory Systems17
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs16
DCMA: Accelerating Parallel DMA Transfers with a Multi-Port Direct Cached Memory Access in a Massive-Parallel Vector Processor16
COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign15
SIMD-Matcher: A SIMD-based Arbitrary Matching Framework14
Tiaozhuan: A General and Efficient Indirect Branch Optimization for Binary Translation14
A Concise Concurrent B + -Tree for Persistent Memory13
Source Matching and Rewriting for MLIR Using String-Based Automata13
Domain-Specific Multi-Level IR Rewriting for GPU12
iSwap: A New Memory Page Swap Mechanism for Reducing Ineffective I/O Operations in Cloud Environments12
Building a Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud Storage12
Accelerating Video Captioning on Heterogeneous System Architectures11
Locality-Aware CTA Scheduling for Gaming Applications10
Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product10
A NUMA-Aware Version of an Adaptive Self-Scheduling Loop Scheduler10
DeepZoning: Re-accelerate CNN Inference with Zoning Graph for Heterogeneous Edge Cluster10
GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core Systems9
ODGS: Dependency-Aware Scheduling for High-Level Synthesis with Graph Neural Network and Reinforcement Learning9
SnsBooster: Enhancing Sampling-based μ Arch Evaluation Efficiency through Online Performance Sensitivity Analysis9
AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs9
Accelerating Nearest Neighbor Search in 3D Point Cloud Registration on GPUs9
Quantifying Resource Contention of Co-located Workloads with the System-level Entropy8
BridgeGC: An Efficient Cross-Level Garbage Collector for Big Data Frameworks8
Flexible and Effective Object Tiering for Heterogeneous Memory Systems8
Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture8
COX : Exposing CUDA Warp-level Functions to CPUs8
An FPGA Overlay for CNN Inference with Fine-grained Flexible Parallelism8
Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions7
NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks7
Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming7
Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive Grouping7
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks7
Towards High Performance QNNs via Distribution-Based CNOT Gate Reduction6
DTAP: Accelerating Strongly-Typed Programs with Data Type-Aware Hardware Prefetching6
Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs6
Multi-objective Hardware-aware Neural Architecture Search with Pareto Rank-preserving Surrogate Models6
EXPERTISE: An Effective Software-level Redundant Multithreading Scheme against Hardware Faults6
Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization6
RT-GNN: Accelerating Sparse Graph Neural Networks by Tensor-CUDA Kernel Fusion6
Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis6
Environmental Condition Aware Super-Resolution Acceleration Framework in Server-Client Hierarchies6
TPRepair: Tree-based Pipelined Repair in Clustered Storage Systems6
RaNAS: Resource-Aware Neural Architecture Search for Edge Computing6
HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache Hierarchy6
System-level Early-stage Modeling and Evaluation of IVR-assisted Processor Power Delivery System6
EDAS: Enabling Fast Data Loading for GPU Serverless Computing5
MemoriaNova: Optimizing Memory-Aware Model Inference for Edge Computing5
gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography5
Towards Optimizing Learned Index for High Performance, Memory Efficiency and NUMA Awareness5
PowerMorph: QoS-Aware Server Power Reshaping for Data Center Regulation Service5
A Stable Idle Time Detection Platform for Real I/O Workloads5
SimTrace: Exploiting Spatial and Temporal Sampling for Large-Scale Performance Analysis5
HEngine: A High Performance Optimization Framework on a GPU for Homomorphic Encryption5
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes5
Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals5
CGCGraph: Efficient CPU-GPU Co-execution for Concurrent Dynamic Graph Processing4
RACER: Avoiding End-to-End Slowdowns in Accelerated Chip Multi-Processors4
Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs4
Byte-Select Compression4
x Meta : SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object Storage4
GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing4
Improving Utilization of Dataflow Unit for Multi-Batch Processing4
FlexHM: A Practical System for Heterogeneous Memory with Flexible and Efficient Performance Optimizations4
Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs4
WIPE: A Write-Optimized Learned Index for Persistent Memory4
Mobile-3DCNN: An Acceleration Framework for Ultra-Real-Time Execution of Large 3D CNNs on Mobile Devices4
OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs4
Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star Networks4
BullsEye : Scalable and Accurate Approximation Framework for Cache Miss Calculation3
Koala: Efficient Pipeline Training through Automated Schedule Searching on Domain-Specific Language3
Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access3
CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature Scaling3
High-performance Deterministic Concurrency Using Lingua Franca3
TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware Scheduling3
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory3
Architectural Support for Sharing, Isolating and Virtualizing FPGA Resources3
Memory-Aware Functional IR for Higher-Level Synthesis of Accelerators3
PANDA: Adaptive Prefetching and Decentralized Scheduling for Dataflow Architectures3
MicroProf : Code-level Attribution of Unnecessary Data Transfer in Microservice Applications3
SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDs3
JiuJITsu: Removing Gadgets with Safe Register Allocation for JIT Code Generation3
Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters3
Architecting Optically Controlled Phase Change Memory3
3D GNLM: Efficient 3D Non-Local Means Kernel with Nested Reuse Strategies for Embedded GPUs3
TSN Cache: Exploiting Data Localities in Graph Computing Applications3
CoNST: Code Generator for Sparse Tensor Networks3
Iterating Pointers: Enabling Static Analysis for Loop-based Pointers3
MetaEC: An Efficient and Resilient Erasure-Coded KV Store on Disaggregated Memory3
Shift-CIM: In-SRAM Alignment To Support General-Purpose Bit-level Sparsity Exploration in SRAM Multiplication3
SAL: Optimizing the Dataflow of Spin-based Architectures for Lightweight Neural Networks3
Scale-out Systolic Arrays3
An FPGA-based Approach to Evaluate Thermal and Resource Management Strategies of Many-core Processors3
CASHT: Contention Analysis in Shared Hierarchies with Thefts3
FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific Compiler3
Abakus: Accelerating k -mer Counting with Storage Technology3
An Example of Parallel Merkle Tree Traversal: Post-Quantum Leighton-Micali Signature on the GPU3
Consequence-based Clustered Architecture3
0.032115936279297