- cross-posted to:
- technology@lemmy.zip
- cross-posted to:
- technology@lemmy.zip
NVIDIA just trained a 12B-parameter language model on 10 trillion tokens entirely in 4-bit precision.
Here’s why this matters:
- NVFP4 delivers 2–3× faster math throughput and 50% less memory vs FP8
- Accuracy? Practically identical. (MMLU-Pro: FP8 = 62.62%, NVFP4 = 62.58%)
- Stability issues have been solved using Random Hadamard transforms, stochastic rounding, and 2D scaling
This is the first successful demonstration of large-scale 4-bit pretraining without losing accuracy.
The next generation of frontier models will be faster, cheaper, without compromise.
- I thought fp4 was for quantization only. Is it for training now too? - looks like and without loss of quality supposedly 
 
- The math is 62% accurate? Is that what that’s saying? - In this context, accuracy is a metric that measures the percentage of questions the model answered correctly on the MMLU-Pro benchmark. So, it’s not math specifically being 62% accurate, but the overall ability of the model to converge on a correct answer. 
 
- next generation of frontier models - lol. Too much grifter speak for me. Slow down on that kool aid. - People building their whole identity around hating LLM tech will never stop being hilarious. - iTs jUsT a PaTtErN mAcHiNe 
 
 





