SakanaAI Launches Open-Source Sparse LLM Kernels Promising Faster Inference and Lower Memory on H100 GPUs
SakanaAI releases 'sparser-faster-llms,' an open-source repository of custom CUDA kernels for H100 GPUs that use sparsity to dramatically speed up LLM inference and cut memory usage, with pretrained models from 0.5B to 2B parameters now available on Hugging Face Hub.