Full GPT Transformer Runs Entirely in Verilog on FPGA, Hits 69,200 Tokens Per Second After Overcoming Critical Synthesis Bugs
A full GPT transformer implemented entirely in RTL Verilog is now running on a Xilinx Virtex-5 FPGA, blazing through 69,200 tokens per second at 80 MHz after engineers overcame two critical synthesis bugs that silently zeroed ROM arrays and folded live registers, causing the board to hang despite passing simulation.