Research

600 articles found

Quantization Slashes AI Model Size By 75% With Minimal Quality Loss, But 2-Bit Compression Causes Near-Total Collapse

Quantization Slashes AI Model Size By 75% With Minimal Quality Loss, But 2-Bit Compression Causes Near-Total Collapse

Mar 26, 2026
ngrok blog

Quantization can slash AI model sizes by 75% with minimal quality loss at 8-bit and 4-bit precision, but pushing compression to 2-bit causes near-total collapse, with 97% of benchmark questions going unanswered and responses devolving into incoherent loops, according to new testing on Qwen3.5 9B.

Google's TurboQuant Slashes LLM Memory by 5x and Boosts Speed 8x With No Accuracy Loss

Google's TurboQuant Slashes LLM Memory by 5x and Boosts Speed 8x With No Accuracy Loss

Mar 25, 2026
MarkTechPost

Google's TurboQuant is revolutionizing AI efficiency, slashing large language model memory usage by over 5x and boosting speed up to 8x with zero accuracy loss, using a data-oblivious quantization algorithm requiring no dataset-specific tuning — maintaining perfect retrieval accuracy across 104,000 tokens in benchmark tests.

Base LLMs Show Strong Semantic Confidence Accuracy, But Fine-Tuning and Chain-of-Thought Reasoning Destroy It

Base LLMs Show Strong Semantic Confidence Accuracy, But Fine-Tuning and Chain-of-Thought Reasoning Destroy It

Mar 25, 2026
Apple Machine Learning Research

New research reveals that base large language models possess strong semantic confidence accuracy, but popular techniques like fine-tuning and chain-of-thought reasoning actively destroy this calibration, raising urgent questions about the reliability of widely deployed AI systems.

Previous
Page 11 of 60
Next
Showing 101 - 110 of 600 articles