Floating-Point Math Quirk and Batch Variability Affect LLM Outputs
A quirk in floating-point arithmetic and batch variability in large language models (LLMs) can lead to inconsistent outputs, but achieving deterministic inference through batch-invariant kernels for key operations like RMSNorm, matrix multiplication, and attention resolves these issues.