Natural Language

1048 articles found

Quantization Slashes AI Model Size By 75% With Minimal Quality Loss, But 2-Bit Compression Causes Near-Total Collapse

Quantization Slashes AI Model Size By 75% With Minimal Quality Loss, But 2-Bit Compression Causes Near-Total Collapse

Mar 26, 2026
ngrok blog

Quantization can slash AI model sizes by 75% with minimal quality loss at 8-bit and 4-bit precision, but pushing compression to 2-bit causes near-total collapse, with 97% of benchmark questions going unanswered and responses devolving into incoherent loops, according to new testing on Qwen3.5 9B.

Google's TurboQuant Slashes LLM Memory by 5x and Boosts Speed 8x With No Accuracy Loss

Google's TurboQuant Slashes LLM Memory by 5x and Boosts Speed 8x With No Accuracy Loss

Mar 25, 2026
MarkTechPost

Google's TurboQuant is revolutionizing AI efficiency, slashing large language model memory usage by over 5x and boosting speed up to 8x with zero accuracy loss, using a data-oblivious quantization algorithm requiring no dataset-specific tuning — maintaining perfect retrieval accuracy across 104,000 tokens in benchmark tests.

Previous
Page 2 of 105
Next
Showing 11 - 20 of 1048 articles