FPGA-Powered AI Runs Karpathy's microGPT at 50,000 Tokens Per Second on DE1-SoC Hardware
Summary
TALOS-V2 brings AI inference to FPGA hardware, running Karpathy's microGPT at a blazing 50,000 tokens per second on a DE1-SoC Cyclone V board using fixed-point arithmetic and SystemVerilog RTL, complete with real-time controls and token output on onboard displays.
Key Points
- TALOS-V2 is a hardware implementation of Karpathy's microGPT language model running on a DE1-SoC Cyclone V FPGA, achieving over 50,000 tokens per second using fixed-point arithmetic in SystemVerilog RTL.
- The project includes a full inference pipeline with RTL model ROM hex files, an RTL-based token sampler, ModelSim testbenches for simulation, Python host utilities for JTAG inference, and Quartus build scripts for programming the FPGA board.
- Board-level controls via switches and LEDs allow users to enable, reset, and monitor inference status in real time, with token output and state information displayed on the HEX displays of the DE1-SoC.