IEEE Computer Architecture Letters

Papers
(The median citation count of IEEE Computer Architecture Letters is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network26
SPAM: Streamlined Prefetcher-Aware Multi-Threaded Cache Covert-Channel Attack16
Accelerating Page Migrations in Operating Systems With Intel DSA14
GPU-Centric Memory Tiering for LLM Serving With NVIDIA Grace Hopper Superchip11
Structured Combinators for Efficient Graph Reduction11
Near-Data Processing in Memory Expander for DNN Acceleration on GPUs11
An Area Efficient Architecture of a Novel Chaotic System for High Randomness Security in e-Health10
R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead10
GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing10
Characterization and Analysis of Text-to-Image Diffusion Models9
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity9
A Case for Hardware Memoization in Server CPUs9
FullPack: Full Vector Utilization for Sub-Byte Quantized Matrix-Vector Multiplication on General Purpose CPUs9
Accelerating Programmable Bootstrapping Targeting Contemporary GPU Microarchitecture8
Analysis of Data Transfer Bottlenecks in Commercial PIM Systems: A Study With UPMEM-PIM8
Baobab Merkle Tree for Efficient Secure Memory8
IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs7
Efficient Implementation of Knuth Yao Sampler on Reconfigurable Hardware7
Hashing ATD Tags for Low-Overhead Safe Contention Monitoring7
Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training6
Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management6
Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture5
Approximate Multiplier Design With LFSR-Based Stochastic Sequence Generators for Edge AI5
NoHammer: Preventing Row Hammer With Last-Level Cache Management5
Guessing Outputs of Dynamically Pruned CNNs Using Memory Access Patterns5
Enhancing DNN Training Efficiency Via Dynamic Asymmetric Architecture5
A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores4
Revisiting Browser Performance Benchmarking From an Architectural Perspective4
SmartIndex: Learning to Index Caches to Improve Performance4
Runtime Support for Accelerating CNN Models on Digital DRAM Processing-in-Memory Hardware4
Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories4
Speculative Multi-Level Access in LSM Tree-Based KV Store4
Open-Source Hardware Memory Protection Engine Integrated With NVMM Simulator4
Toward Practical 128-Bit General Purpose Microarchitectures4
A Model for Scalable and Balanced Accelerators for Graph Processing4
On Variable Strength Quantum ECC4
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications4
BTB-X: A Storage-Effective BTB Organization4
Intelligent SSD Firmware for Zero-Overhead Journaling3
Learned Performance Model for SSD3
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures3
GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks3
Redundant Array of Independent Memory Devices3
The Importance of Generalizability in Machine Learning for Systems3
Data-Aware Compression of Neural Networks2
LT-PIM: An LUT-Based Processing-in-DRAM Architecture With RowHammer Self-Tracking2
UDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow Architectures2
Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency2
GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance2
DPWatch: a Framework for Hardware-Based Differential Privacy Guarantees2
SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication2
Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers2
SparseLeakyNets: Classification Prediction Attack Over Sparsity-Aware Embedded Neural Networks Using Timing Side-Channel Information2
Exploring PIM Architecture for High-Performance Graph Pattern Mining2
Managing Prefetchers With Deep Reinforcement Learning2
eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models2
HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization2
Cost-Effective Extension of DRAM-PIM for Group-Wise LLM Quantization2
Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures2
SPGPU: Spatially Programmed GPU2
Empirical Architectural Analysis on Performance Scalability of Petascale All-Flash Storage Systems2
BayesTuner: Leveraging Bayesian Optimization For DNN Inference Configuration Selection2
Hardware-Implemented Lightweight Accelerator for Large Integer Polynomial Multiplication2
Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching2
A Characterization of Generative Recommendation Models: Study of Hierarchical Sequential Transduction Unit2
Editorial: A Letter From the Editor-in-Chief of IEEE Computer Architecture Letters2
A Quantitative Analysis of State Space Model-Based Large Language Model: Study of Hungry Hungry Hippos2
Characterizing Machine Learning-Based Runtime Prefetcher Selection2
Decoupled SSD: Reducing Data Movement on NAND-Based Flash SSD1
CoreNap: Energy Efficient Core Allocation for Latency-Critical Workloads1
DAMARU: A Denial-of-Service Attack on Randomized Last-Level Caches1
Pulley: An Algorithm/Hardware Co-Optimization for In-Memory Sorting1
Hardware Trojan Threats to Cache Coherence in Modern 2.5D Chiplet Systems1
A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications1
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models1
HBM3 RAS: Enhancing Resilience at Scale1
Containerized In-Storage Processing Model and Hardware Acceleration for Fully-Flexible Computational SSDs1
The Jaseci Programming Paradigm and Runtime Stack: Building Scale-Out Production Applications Easy and Fast1
LSim: Fine-Grained Simulation Framework for Large-Scale Performance Evaluation1
SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering1
By-Software Branch Prediction in Loops1
Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators1
2021 Index IEEE Computer Architecture Letters Vol. 201
Scale-Model Simulation1
LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads1
Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling1
Characterization and Analysis of the 3D Gaussian Splatting Rendering Pipeline1
An Intermediate Language for General Sparse Format Customization1
PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks1
Design of a High-Performance, High-Endurance Key-Value SSD for Large-Key Workloads1
Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training1
Towards Improved Power Management in Cloud GPUs1
Infinity Stream: Enabling Transparent and Automated In-Memory Computing1
Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists1
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs0
MPU-Sim: A Simulator for In-DRAM Near-Bank Processing Architectures0
Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator0
Accelerating Control Flow on CGRAs Via Speculative Iteration Execution0
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs0
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains0
Cooperative Memory Deduplication With Intel Data Streaming Accelerator0
Estimating CPI Stacks from Multiplexed Performance Counter Data Using Machine Learning0
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping0
The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE0
Comprehensive Design Space Exploration for Graph Neural Network Aggregation on GPUs0
Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions0
Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics0
LTE: Lightweight and Time-Efficient Hardware Encoder for Post-Quantum Scheme HQC0
Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity0
Energy-Efficient Bayesian Inference Using Bitstream Computing0
SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors0
Octopus: A Cycle-Accurate Cache System Simulator0
2024 Reviewers List0
Kobold: Simplified Cache Coherence for Cache-Attached Accelerators0
Guard Cache: Creating Noisy Side-Channels0
Exploring the Latency Sensitivity of Cache Replacement Policies0
Achieving Forward Progress Guarantee in Small Hardware Transactions0
Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference0
ONNXim: A Fast, Cycle-Level Multi-Core NPU Simulator0
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays0
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems0
Improving Energy-Efficiency of Capsule Networks on Modern GPUs0
Direct-Coding DNA With Multilevel Parallelism0
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing0
DVFaaS: Leveraging DVFS for FaaS Workflows0
SLO-Aware GPU DVFS for Energy-Efficient LLM Inference Serving0
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation0
Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator0
The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory0
Characterizing and Understanding HGNNs on GPUs0
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs0
Hardware-Accelerated Kernel-Space Memory Compression Using Intel QAT0
A Data Prefetcher-Based 1000-Core RISC-V Processor for Efficient Processing of Graph Neural Networks0
Stride Equality Prediction for Value Speculation0
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores0
WPC: Whole-Picture Workload Characterization Across Intermediate Representation, ISA, and Microarchitecture0
X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands0
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving0
Tulip: Turn-Free Low-Power Network-on-Chip0
PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors0
Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs0
Toward Scalable RDMA Through Resource Prefetching0
SmartQuant: CXL-Based AI Model Store in Support of Runtime Configurable Weight Quantization0
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks0
TeleVM: A Lightweight Virtual Machine for RISC-V Architecture0
Exploiting Direct Memory Operands in GPU Instructions0
Modeling Periodic Energy-Harvesting Computing Systems0
Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference0
Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing0
A DSP-Based Precision-Scalable MAC With Hybrid Dataflow for Arbitrary-Basis-Quantization CNN Accelerator0
STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators0
TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA0
Fast Performance Prediction for Efficient Distributed DNN Training0
Architectural Implications of GNN Aggregation Programming Abstractions0
Accelerating Graph Processing With Lightweight Learning-Based Data Reordering0
Primate: A Framework to Automatically Generate Soft Processors for Network Applications0
Quantum Assertion Scheme for Assuring Qudit Robustness0
Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines0
A First-Order Model to Assess Computer Architecture Sustainability0
Ensuring Data Confidentiality in eADR-Based NVM Systems0
DRAMA: Commodity DRAM Based Content Addressable Memory0
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications0
ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs0
XLA-NDP: Efficient Scheduling and Code Generation for Deep Learning Model Training on Near-Data Processing Memory0
MajorK: Majority Based kmer Matching in Commodity DRAM0
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator0
A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems0
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture0
Characterizing and Understanding Distributed GNN Training on GPUs0
Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU0
Data-Pattern-Driven LUT for Efficient In-Cache Computing in CNNs Acceleration0
Exploiting Intel AMX Power Gating0
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching0
Security Helper Chiplets: A New Paradigm for Secure Hardware Monitoring0
SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads0
Heterogeneity-Aware Scheduling on SoCs for Autonomous Vehicles0
SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs0
Cycle-Oriented Dynamic Approximation: Architectural Framework to Meet Performance Requirements0
RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models0
LMT: Accurate and Resource-Scalable Slowdown Prediction0
X-PPR: Post Package Repair for CXL Memory0
Dramaton: A Near-DRAM Accelerator for Large Number Theoretic Transforms0
Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms0
LINAC: A Spatially Linear Accelerator for Convolutional Neural Networks0
JANM-IK: Jacobian Argumented Nelder-Mead Algorithm for Inverse Kinematics and its Hardware Acceleration0
In-Memory Versioning (IMV)0
Accelerators & Security: The Socket Approach0
A Quantum Computer Trusted Execution Environment0
MQSim-E: An Enterprise SSD Simulator0
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack0
Characterizing and Understanding Defense Methods for GNNs on GPUs0
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems0
Architectural Security Regulation0
Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model0
SoCurity: A Design Approach for Enhancing SoC Security0
Architecting Compatible PIM Protocol for CPU-PIM Collaboration0
A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models0
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System0
Instruction Criticality Based Energy-Efficient Hardware Data Prefetching0
Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory0
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations0
Electra: Eliminating the Ineffectual Computations on Bitmap Compressed Matrices0
ADT: Aggressive Demotion and Promotion for Tiered Memory0
Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs0
DNA Pre-Alignment Filter Using Processing Near Racetrack Memory0
Thor: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX0
Accelerating Vector Permutation Instruction Execution via Controllable Bitonic Network0
0.052347898483276