NVIDIA just trained a 12B-parameter language model on 10 trillion tokens entirely in 4-bit precision.

Here’s why this matters:

  • NVFP4 delivers 2–3× faster math throughput and 50% less memory vs FP8
  • Accuracy? Practically identical. (MMLU-Pro: FP8 = 62.62%, NVFP4 = 62.58%)
  • Stability issues have been solved using Random Hadamard transforms, stochastic rounding, and 2D scaling

This is the first successful demonstration of large-scale 4-bit pretraining without losing accuracy.

The next generation of frontier models will be faster, cheaper, without compromise.

    • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
      link
      fedilink
      arrow-up
      2
      ·
      6 days ago

      In this context, accuracy is a metric that measures the percentage of questions the model answered correctly on the MMLU-Pro benchmark. So, it’s not math specifically being 62% accurate, but the overall ability of the model to converge on a correct answer.