IEEE Computer Architecture Letters

Papers
(The median citation count of IEEE Computer Architecture Letters is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)
ArticleCitations
Speculative Multi-Level Access in LSM Tree-Based KV Store37
Accelerating Programmable Bootstrapping Targeting Contemporary GPU Microarchitecture32
A Characterization of Generative Recommendation Models: Study of Hierarchical Sequential Transduction Unit15
Characterization and Analysis of Text-to-Image Diffusion Models15
The Architectural Sustainability Indicator15
Old is Gold: Optimizing Single-Threaded Applications With ExGen-Malloc13
Toward Practical 128-Bit General Purpose Microarchitectures13
SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication12
Time Series Machine Learning Models for Precise SSD Access Latency Prediction11
Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs10
SoCurity: A Design Approach for Enhancing SoC Security10
2021 Index IEEE Computer Architecture Letters Vol. 2010
Improving Energy-Efficiency of Capsule Networks on Modern GPUs9
In-Memory Versioning (IMV)8
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs8
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System8
OASIS: Outlier-Aware KV Cache Clustering for Scaling LLM Inference in CXL Memory Systems8
Exploring the DIMM PIM Architecture for Accelerating Time Series Analysis7
Security Helper Chiplets: A New Paradigm for Secure Hardware Monitoring7
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing7
Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference7
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture6
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping6
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM6
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications6
Managing Prefetchers With Deep Reinforcement Learning5
pNet-gem5: Full-System Simulation With High-Performance Networking Enabled by Parallel Network Packet Processing5
SparseLeakyNets: Classification Prediction Attack Over Sparsity-Aware Embedded Neural Networks Using Timing Side-Channel Information5
High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network5
LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads5
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity5
NoHammer: Preventing Row Hammer With Last-Level Cache Management5
Memory-Centric MCM-GPU Architecture4
ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs4
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores4
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency4
PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks4
Primate: A Framework to Automatically Generate Soft Processors for Network Applications4
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains4
A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems4
Architectural Implications of GNN Aggregation Programming Abstractions3
Accelerators & Security: The Socket Approach3
SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors3
Camulator: a Lightweight and Extensible Trace-Driven Cache Simulator for Embedded Multicore SoCs3
Direct-Coding DNA With Multilevel Parallelism3
Fast Performance Prediction for Efficient Distributed DNN Training3
Exploring Volatile FPGAs Potential for Accelerating Energy-Harvesting IoT Applications3
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving3
A Quantum Computer Trusted Execution Environment3
Guard Cache: Creating Noisy Side-Channels3
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems3
PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors2
A First-Order Model to Assess Computer Architecture Sustainability2
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems2
FullPack: Full Vector Utilization for Sub-Byte Quantized Matrix-Vector Multiplication on General Purpose CPUs2
Accelerating Page Migrations in Operating Systems With Intel DSA2
Analyzing and Exploiting Memory Hierarchy Parallelism With MLP Stacks2
SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs2
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations2
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation2
Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads2
IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs2
R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead2
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching2
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack2
Energy-Efficient Bayesian Inference Using Bitstream Computing2
Minimal Counters, Maximum Insight: Simplifying System Performance With HPC Clusters for Optimized Monitoring2
A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores2
EgDiff: An Enhanced Global Load Value Predictor2
Exploiting Intel AMX Power Gating1
Electra: Eliminating the Ineffectual Computations on Bitmap Compressed Matrices1
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models1
Amethyst: Reducing Data Center Emissions With Dynamic Autotuning and VM Management1
Characterization and Analysis of the 3D Gaussian Splatting Rendering Pipeline1
MOST: Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage1
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures1
A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models1
Redundant Array of Independent Memory Devices1
Architectural Security Regulation1
TeleVM: A Lightweight Virtual Machine for RISC-V Architecture1
MQSim-E: An Enterprise SSD Simulator1
Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs1
LSim: Fine-Grained Simulation Framework for Large-Scale Performance Evaluation1
Halis: A Hardware-Software Co-designed Near-Cache Accelerator for Graph Pattern Mining1
Cost-Effective Extension of DRAM-PIM for Group-Wise LLM Quantization1
On Internally-Tagged Instruction Set Architectures1
Characterizing and Understanding HGNNs on GPUs1
Approximate Multiplier Design With LFSR-Based Stochastic Sequence Generators for Edge AI1
X-PPR: Post Package Repair for CXL Memory1
An Intermediate Language for General Sparse Format Customization1
HINT: A Hardware Platform for Intra-host NIC Traffic and SmartNIC Emulation1
Exploiting Direct Memory Operands in GPU Instructions1
A Data Prefetcher-Based 1000-Core RISC-V Processor for Efficient Processing of Graph Neural Networks1
Tulip: Turn-Free Low-Power Network-on-Chip1
eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models1
A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications1
Characterizing and Understanding Distributed GNN Training on GPUs1
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications1
MajorK: Majority Based kmer Matching in Commodity DRAM1
Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture1
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs1
Enhancing DNN Training Efficiency Via Dynamic Asymmetric Architecture1
Pyramid: Accelerating LLM Inference With Cross-Level Processing-in-Memory1
Runtime Support for Accelerating CNN Models on Digital DRAM Processing-in-Memory Hardware0
Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training0
Accelerating Control Flow on CGRAs via Speculative Iteration Execution0
Revisiting Browser Performance Benchmarking From an Architectural Perspective0
Stardust: Scalable and Transferable Workload Mapping for Large AI on Multi-Chiplet Systems0
Comprehensive Design Space Exploration for Graph Neural Network Aggregation on GPUs0
Hardware-Implemented Lightweight Accelerator for Large Integer Polynomial Multiplication0
Empirical Architectural Analysis on Performance Scalability of Petascale All-Flash Storage Systems0
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs0
A Quantitative Analysis of State Space Model-Based Large Language Model: Study of Hungry Hungry Hippos0
Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories0
Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency0
Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines0
Editorial: A Letter From the Editor-in-Chief of IEEE Computer Architecture Letters0
Segin: Synergistically Enabling Fine-Grained Multi-Tenant and Resource Optimized SpMV0
RoSR: A Novel Selective Retransmission FPGA Architecture for RDMA NICs0
Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling0
Cooperative Memory Deduplication With Intel Data Streaming Accelerator0
Approximate SFQ-based Computing Architecture Modeling with Device-level Guidelines0
Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD0
Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms0
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks0
A Case for Hardware Memoization in Server CPUs0
Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching0
CoreNap: Energy Efficient Core Allocation for Latency-Critical Workloads0
JANM-IK: Jacobian Argumented Nelder-Mead Algorithm for Inverse Kinematics and its Hardware Acceleration0
On Variable Strength Quantum ECC0
Quantum Assertion Scheme for Assuring Qudit Robustness0
Cycle-Oriented Dynamic Approximation: Architectural Framework to Meet Performance Requirements0
GPU-Centric Memory Tiering for LLM Serving With NVIDIA Grace Hopper Superchip0
2024 Reviewers List0
SmartQuant: CXL-Based AI Model Store in Support of Runtime Configurable Weight Quantization0
DVFaaS: Leveraging DVFS for FaaS Workflows0
Cache and Near-Data Co-Design for Chiplets0
Hardware Trojan Threats to Cache Coherence in Modern 2.5D Chiplet Systems0
Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator0
SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads0
Octopus: A Cycle-Accurate Cache System Simulator0
Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU0
Analysis of Data Transfer Bottlenecks in Commercial PIM Systems: A Study With UPMEM-PIM0
Ensuring Data Confidentiality in eADR-Based NVM Systems0
Efficient Implementation of Knuth Yao Sampler on Reconfigurable Hardware0
XLA-NDP: Efficient Scheduling and Code Generation for Deep Learning Model Training on Near-Data Processing Memory0
LT-PIM: An LUT-Based Processing-in-DRAM Architecture With RowHammer Self-Tracking0
UDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow Architectures0
Stride Equality Prediction for Value Speculation0
Towards Improved Power Management in Cloud GPUs0
Baobab Merkle Tree for Efficient Secure Memory0
SPGPU: Spatially Programmed GPU0
LINAC: A Spatially Linear Accelerator for Convolutional Neural Networks0
DPWatch: A Framework for Hardware-Based Differential Privacy Guarantees0
Toward Scalable RDMA Through Resource Prefetching0
Characterizing Machine Learning-Based Runtime Prefetcher Selection0
Exploring the Latency Sensitivity of Cache Replacement Policies0
A Model for Scalable and Balanced Accelerators for Graph Processing0
ONNXim: A Fast, Cycle-Level Multi-Core NPU Simulator0
By-Software Branch Prediction in Loops0
Achieving Forward Progress Guarantee in Small Hardware Transactions0
Thor: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX0
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator0
HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization0
Design of a High-Performance, High-Endurance Key-Value SSD for Large-Key Workloads0
Data-Pattern-Driven LUT for Efficient In-Cache Computing in CNNs Acceleration0
Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management0
X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands0
Containerized In-Storage Processing Model and Hardware Acceleration for Fully-Flexible Computational SSDs0
ADT: Aggressive Demotion and Promotion for Tiered Memory0
Intelligent SSD Firmware for Zero-Overhead Journaling0
A DSP-Based Precision-Scalable MAC With Hybrid Dataflow for Arbitrary-Basis-Quantization CNN Accelerator0
Characterizing and Understanding Defense Methods for GNNs on GPUs0
SPAM: Streamlined Prefetcher-Aware Multi-Threaded Cache Covert-Channel Attack0
Estimating CPI Stacks From Multiplexed Performance Counter Data Using Machine Learning0
RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models0
DRAMA: Commodity DRAM Based Content Addressable Memory0
LMT: Accurate and Resource-Scalable Slowdown Prediction0
TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA0
MPU-Sim: A Simulator for In-DRAM Near-Bank Processing Architectures0
Kobold: Simplified Cache Coherence for Cache-Attached Accelerators0
In-Memory Computing Accelerator for Iterative Linear Algebra Solvers0
GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing0
Infinity Stream: Enabling Transparent and Automated In-Memory Computing0
SmartIndex: Learning to Index Caches to Improve Performance0
LTE: Lightweight and Time-Efficient Hardware Encoder for Post-Quantum Scheme HQC0
An Area Efficient Architecture of a Novel Chaotic System for High Randomness Security in e-Health0
Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists0
DNA Pre-Alignment Filter Using Processing Near Racetrack Memory0
Correct Wrong Path0
DynaFlow: An ML Framework for Dynamic Dataflow Selection in SpGEMM Accelerators0
HPN-SpGEMM: Hybrid PIM-NMP for SpGEMM0
Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs0
Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures0
The Importance of Generalizability in Machine Learning for Systems0
GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance0
srNAND: A Novel NAND Flash Organization for Enhanced Small Read Throughput in SSDs0
Open-Source Hardware Memory Protection Engine Integrated With NVMM Simulator0
MixDiT: Accelerating Image Diffusion Transformer Inference With Mixed-Precision MX Quantization0
SAFE: Sharing-Aware Prefetching for Efficient GPU Memory Management With Unified Virtual Memory0
Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory0
Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity0
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays0
GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks0
Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers0
Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator0
Structured Combinators for Efficient Graph Reduction0
Architecting Compatible PIM Protocol for CPU-PIM Collaboration0
The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE0
Hashing ATD Tags for Low-Overhead Safe Contention Monitoring0
Cosmos: A CXL-Based Full In-Memory System for Approximate Nearest Neighbor Search0
The Jaseci Programming Paradigm and Runtime Stack: Building Scale-Out Production Applications Easy and Fast0
Dramaton: A Near-DRAM Accelerator for Large Number Theoretic Transforms0
Pulley: An Algorithm/Hardware Co-Optimization for In-Memory Sorting0
SLO-Aware GPU DVFS for Energy-Efficient LLM Inference Serving0
Accelerating Vector Permutation Instruction Execution via Controllable Bitonic Network0
L-DTC: Load-based Dynamic Throughput Control for Guaranteed I/O Performance in Virtualized Environments0
Hardware-Accelerated Kernel-Space Memory Compression Using Intel QAT0
Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference0
WoperTM: Got Nacks? Use Them!0
Accelerating Graph Processing With Lightweight Learning-Based Data Reordering0
0.042060852050781