IEEE Micro

Papers
(The TQCC of IEEE Micro is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-05-01 to 2024-05-01.)
ArticleCitations
NVIDIA A100 Tensor Core GPU: Performance and Innovation133
Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs124
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings82
The Design Process for Google's Training Chips: TPUv2 and TPUv348
PEFL: Deep Privacy-Encoding-Based Federated Learning Framework for Smart Agriculture48
FerroElectronics for Edge Intelligence47
Accelerating Genome Analysis: A Primer on an Ongoing Journey45
FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications38
BlackParrot: An Agile Open-Source RISC-V Multicore for Accelerator SoCs37
Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM35
Chasing Carbon: The Elusive Environmental Footprint of Computing32
PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification31
Accelerating Chip Design With Machine Learning31
MHADBOR: AI-Enabled Administrative-Distance-Based Opportunistic Load Balancing Scheme for an Agriculture Internet of Things Network29
A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC29
Kunpeng 920: The First 7-nm Chiplet-Based 64-Core ARM SoC for Cloud Services28
Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing26
Evolution of the Graphics Processing Unit (GPU)25
OpenFPGA: An Open-Source Framework for Agile Prototyping Customizable FPGAs25
SymbiFlow and VPR: An Open-Source Design Flow for Commercial and Novel FPGAs23
ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks22
Intel Alder Lake CPU Architectures21
Quantum Computers for High-Performance Computing20
The Path to Successful Wafer-Scale Integration: The Cerebras Story19
Klessydra-T: Designing Vector Coprocessors for Multithreaded Edge-Computing Cores19
Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators18
Superconductor Computing for Neural Networks18
Extending the Frontier of Quantum Computers With Qutrits17
TSA-NoC: Learning-Based Threat Detection and Mitigation for Secure Network-on-Chip Architecture17
Quantum Computing—From NISQ to PISQ16
ML-HW Co-Design of Noise-Robust TinyML Models and Always-On Analog Compute-in-Memory Edge Accelerator15
PCI Express 6.0 Specification: A Low-Latency, High-Bandwidth, High-Reliability, and Cost-Effective Interconnect With 64.0 GT/s PAM-4 Signaling15
IBM's POWER10 Processor14
Data Centers on Wheels: Emissions From Computing Onboard Autonomous Vehicles14
Artificial Intelligence Best Practices in Smart Agriculture14
CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development14
FPGA-Accelerated Quantum Computing Emulation and Quantum Key Distillation13
Challenges and Opportunities for Autonomous Micro-UAVs in Precision Agriculture13
Agile Hardware Development and Instrumentation With PyRTL13
Generating Systolic Array Accelerators With Reusable Blocks13
On-Demand Mobile CPU Cooling With Thin-Film Thermoelectric Array13
Interconnects for DNA, Quantum, In-Memory, and Optical Computing: Insights From a Panel Discussion12
NVIDIA Hopper H100 GPU: Scaling Performance12
Aquabolt-XL HBM2-PIM, LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer12
Evaluating Sensor Data Quality in Internet of Things Smart Agriculture Applications11
Accelerator Integration for Open-Source SoC Design11
Temporal Computing With Superconductors11
An Open Inter-Chiplet Communication Link: Bunch of Wires (BoW)11
Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms10
AIDA: Associative In-Memory Deep Learning Accelerator10
The AMD Next-Generation “Zen 3” Core10
Co-Design and System for the Supercomputer “Fugaku”10
Cost-Effective and Flexible Asynchronous Interconnect Technology for GALS Systems10
Compute Substrate for Software 2.010
Quantum Codesign9
OpenPiton at 5: A Nexus for Open and Agile Hardware Design9
Architecting Noisy Intermediate-Scale Quantum Computers: A Real-System Study9
A Taxonomy of ML for Systems Problems8
Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud8
UAV–Assisted Joint Wireless Power Transfer and Data Collection Mechanism for Sustainable Precision Agriculture in 5G8
ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor8
A Programmable Approach to Neural Network Compression8
Configurable Network Protocol Accelerator (COPA)8
Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives8
Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data7
A Next-Generation Cryogenic Processor Architecture7
Unveiling the Hardware and Software Implications of Microservices in Cloud and Edge Systems7
Neuromorphic Near-Sensor Computing: From Event-Based Sensing to Edge Learning7
Temperature-Resilient RRAM-Based In-Memory Computing for DNN Inference7
LiveHD: A Productive Live Hardware Development Flow7
A Case for Accelerating Software RTL Simulation7
On Double Full-Stack Communication-Enabled Architectures for Multicore Quantum Computers7
History of IBM Z Mainframe Processors6
Bridging Python to Silicon: The SODA Toolchain6
TinyIREE: An ML Execution Environment for Embedded Systems From Compilation to Deployment6
System on a Package Innovations With Universal Chiplet Interconnect Express (UCIe) Interconnect6
Democratizing Data-Driven Agriculture Using Affordable Hardware6
A Low-Latency and Low-Power Approach for Coherency and Memory Protocols on PCI Express 6.0 PHY at 64.0 GT/s With PAM-4 Signaling6
Hertzbleed: Turning Power Side-Channel Attacks Into Remote Timing Attacks on x866
Rome to Milan, AMD Continues Its Tour of Italy6
High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA6
A Single-Shot Generalized Device Placement for Large Dataflow Graphs6
Memory Pooling With CXL6
Three-Dimensional Stacked Neural Network Accelerator Architectures for AR/VR Applications6
Power Side-Channel Attacks in Negative Capacitance Transistor6
PurpleDrop: A Digital Microfluidics-Based Platform for Hybrid Molecular-Electronics Applications6
MicroScope: Enabling Microarchitectural Replay Attacks6
Accelerating ML Recommendation With Over 1,000 RISC-V/Tensor Processors on Esperanto's ET-SoC-1 Chip6
Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V6
AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers5
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale5
Marvell ThunderX3: Next-Generation Arm-Based Server Processor5
The Open Domain-Specific Architecture5
Balancing Specialized Versus Flexible Computation in Brain–Computer Interfaces5
Hidden Potential Within Video Game Consoles5
Agile and Open-Source Hardware5
The Vision Behind MLPerf: Understanding AI Inference Performance5
Meet the FaM1ly5
RadioML Meets FINN: Enabling Future RF Applications With FPGA Streaming Architectures5
Advances in Microprocessor Cache Architectures Over the Last 25 Years5
History of Microcontrollers: First 50 Years5
ILLIXR: An Open Testbed to Enable Extended Reality Systems Research5
Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning5
uGEMM: Unary Computing for GEMM Applications5
Compiling for the IBM Matrix Engine for Enterprise Workloads4
Energy-Efficient Video Processing for Virtual Reality4
ExHero: Execution History-Aware Error-Rate Estimation in Pipelined Designs4
Compute Express Link (CXL): Enabling Heterogeneous Data-Centric Computing With Heterogeneous Memory Hierarchy4
Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim4
Soil Fertility Monitoring With Internet of Underground Things: A Survey4
CXL-Enabled Enhanced Memory Functions4
Emerging Technologies for Quantum Computing4
The Arm Morello Evaluation Platform—Validating CHERI-Based Security in a High-Performance System4
Accelerating Allreduce With In-Network Reduction on Intel PIUMA4
Kaya for Computer Architects: Toward Sustainable Computer Systems4
The AMD 400-G Adaptive SmartNIC System on Chip: A Technology Preview4
Tydi: An Open Specification for Complex Data Structures Over Hardware Streams3
Shortages of Integrated Circuits3
Pensando Distributed Services Architecture3
Universal Graph-Based Scheduling for Quantum Systems3
Artificial-Intelligence-Enhanced Ultrasound Flow Imaging at the Edge3
Unifying Spatial Accelerator Compilation With Idiomatic and Modular Transformations3
FPGA Computing3
Failure Tolerant Training With Persistent Memory Disaggregation Over CXL3
Efficient Language-Guided Reinforcement Learning for Resource-Constrained Autonomous Systems3
The Apollo Guidance Computer3
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs3
Accelerating Genomic Data Analytics With Composable Hardware Acceleration Framework3
Enhancing Model Parallelism in Neural Architecture Search for Multidevice System3
LSFQ: A Low-Bit Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration3
A Parallel and Updatable Architecture for FPGA-Based Packet Classification With Large-Scale Rule Sets3
Architectural CO2 Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool3
Photonic Network-on-Wafer for Multichiplet GPUs3
Countering Load-to-Use Stalls in the NVIDIA Turing GPU3
LastLayer: Toward Hardware and Software Continuous Integration2
Combining Multiple tinyML Models for Multimodal Context-Aware Stress Recognition on Constrained Microcontrollers2
ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library2
SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander2
Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs2
Compiling for Vector Extensions With Stream-Based Specialization2
The Origin of Intel's Micro-Ops2
Towards General-Purpose Acceleration: Finding Structure in Irregularity2
HALO: A Hardware–Software Co-Designed Processor for Brain–Computer Interfaces2
Microprocessor Advances and the Mainframe Legacy2
Hardware Specialization: From Cell to Heterogeneous Microprocessors Everywhere2
PCs Take a Page From Xbox With Pluton2
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part V: References2
A Mobile DNN Training Processor With Automatic Bit Precision Search and Fine-Grained Sparsity Exploitation2
Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility2
A Binary Translation Framework for Automated Hardware Generation2
Quantum Computing and the Design of the Ultimate Accelerator2
Accelerating Phylogenetics Using FPGAs in the Cloud2
HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems2
Sustainable AI Processing at the Edge2
Overclocking in Immersion-Cooled Datacenters2
Exploring Memory-Oriented Design Optimization of Edge AI Hardware for Extended Reality Applications2
Machine Learning for Systems2
SpecHLS: Speculative Accelerator Design Using High-Level Synthesis2
Distributed Deep Learning With GPU-FPGA Heterogeneous Computing2
Virtual Logical Qubits: A Compact Architecture for Fault-Tolerant Quantum Computing2
Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite2
Creating Foundations for Secure Microarchitectures With Data-Oblivious ISA Extensions2
Understanding Acceleration Opportunities at Hyperscale2
The Intel Programmable and Integrated Unified Memory Architecture Graph Analytics Processor2
Enterprise-Class Multilevel Cache Design: Low Latency, Huge Capacity, and High Reliability1
Special Issue on Artificial Intelligence at the Edge1
On-Device Tiny Machine Learning for Anomaly Detection Based on the Extreme Values Theory1
Z80—The 1970s Microprocessor Still Alive1
DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center Networks1
The 50 Year History of the Microprocessor as Five Technology Eras1
Special Issue on Hot Interconnects1
Reliable and Time-Efficient Virtualized Function Placement1
Characterizing and Modeling Nonvolatile Memory Systems1
Vector Runahead for Indirect Memory Accesses1
Early History of Texas Instrument's Digital Signal Processor1
Masthead1
XCRYPT: Accelerating Lattice-Based Cryptography With Memristor Crossbar Arrays1
Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads1
The Economics of Confrontational Conversation1
The Microarchitecture of DOJO, Tesla’s Exa-Scale Computer1
Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency1
Acceleration of a Classic McEliece Postquantum Cryptosystem With Cache Processing1
Biology and Systems Interactions1
Pod-racing: bulk-bitwise to floating-point compute in racetrack memory for machine learning at the edge1
Toward Developing High-Performance RISC-V Processors Using Agile Methodology1
TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology1
Retargetable Optimizing Compilers for Quantum Accelerators via a Multilevel Intermediate Representation1
Varifocal Storage: Dynamic Multiresolution Data Storage1
A Hardware/Software Co-Design Vision for Deep Learning at the Edge1
IEEE Computer Society1
I-DVFS: Instantaneous Frequency Switch During Dynamic Voltage and Frequency Scaling1
AI and Memory Wall1
The Fox and Shepherd Problem1
The Xbox Series X System Architecture1
Monitoring InfiniBand Networks to React Efficiently to Congestion1
A 10.7-µJ/Frame 88% Accuracy CIFAR-10 Single-Chip Neuromorphic Field-Programmable Gate Array Processor Featuring Various Nonlinear Functions of Dendrites in the Human Cerebrum1
Navigating the Seismic Shift of Post-Moore Computer Systems Design1
Data Movement Accelerator Engines on a Prototype Power10 Processor1
A Brief History of Warehouse-Scale Computing1
Increasing Throughput of In-Memory DNN Accelerators by Flexible Layerwise DNN Approximation1
speedAI240: A 2-Petaflop, 30-Teraflops/W At-Memory Inference Acceleration Device With 1456 RISC-V Cores1
A Compressed Spiking Neural Network Onto a Memcapacitive In-Memory Computing Array1
Economic Dependencies in Integrated Circuits1
Special Issue on In-Memory Computing1
IEEE Computer Society: Volunteer Service Awards1
Characterizing and Mitigating Soft Errors in GPU DRAM1
Warehouse-Scale Video Acceleration1
Leaking Secrets Through Compressed Caches1
Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration1
BabelFish: Fusing Address Translations for Containers1
Interactions, Impacts, and Coincidences of the First Golden Age of Computer Architecture1
Special Issue on Artificial Intelligence, Edge, and Internet of Things for Smart Agriculture1
Online Code Layout Optimizations via OCOLOS1
0.027201890945435