Hardware Accelerators for Machine Learning (CS 217)

This course explores the design, programming, and performance of modern AI accelerators. It covers architectural techniques, dataflow, tensor processing, memory hierarchies, compilation for accelerators, and emerging trends in AI computing. This course will cover modern AI/ML algorithms such as convolutional neural nets, and Transformer-based models / LLMs. We will consider both training and inference for these models and discuss the impact of parameters such as batch size, precision, sparsity and compression on the accuracy of these models. Students will become familiar with hardware implementation techniques for using parallelism, locality, and low precision to implement the core computational kernels used in ML. Students will develop intuitions to make system-level trade-offs to design energy-efficient AI accelerators. Students will apply these concepts by implementing block-level designs in C++/SystemC, synthesizing them via high-level synthesis (HLS), and then instrumenting and evaluating the resulting system on cloud FPGAs. Prerequisites: CS 149 or EE 180. CS 229 is ideal, but not required.

Hardware Accelerators for Machine Learning (CS 217)

Stanford University, Winter 2026

Instructors:

Teaching Assistants

Schedule