Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Gemini 3 (Fast) got it right for me; it said that unless I wanna carry my car there it’s better to drive, and it suggested that I could use the car to carry cleaning supplies, too.
You never know. The car wash may be out of order and you might need to wash your car by hand.
Well it is a 9B model after all. Self hosted models become a minimum “intelligent” at 16B parameters. For context the models ran in Google servers are close to 300B parameters models
Not sure how we’re quantifying intelligence here. Benchmarks?
Qwen3-4B 2507 Instruct (4B) outperforms GPT-4.1 nano (7B) on all stated benchmarks. It outperforms GPT-4.1 mini (~27B according to scuttlebutt) on mathematical and logical reasoning benchmarks, but loses (barely) on instruction-following and knowledge benchmarks. It outperforms GPT-4o (~200B) on a few specific domains (math, creative writing), but loses overall (because of course it would). The abliterated cooks of it are stronger yet in a few specific areas too.
https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF
https://huggingface.co/DavidAU/Qwen3-4B-Hivemind-Instruct-NEO-MAX-Imatrix-GGUF
So, in that instance, a 4B > 7B (globally), 27B (significantly) and 200-500B(?) situationally. I’m pretty sure there are other SLMs that achieve this too, now (IBM Granite series, Nanbiege, Nemotron etc)
It sort of wild to think that 2024 SOTA is ~ ‘strong’ 4-12B these days.
I think (believe) that we’re sort of getting to the point where the next step forward is going to be “densification” and/or architecture shift (maybe M$ can finally pull their finger out and release the promised 1.58 bit next step architectures).
ICBW / IANAE
Any source for that info? Seems important to know and assert the quality, no?
Here:
https://www.sitepoint.com/local-llms-complete-guide/
https://www.hardware-corner.net/running-llms-locally-introduction/
https://travis.media/blog/ai-model-parameters-explained/
https://claude.ai/public/artifacts/0ecdfb83-807b-4481-8456-8605d48a356c
https://labelyourdata.com/articles/llm-fine-tuning/llm-model-size
https://medium.com/@prashantramnyc/understanding-parameters-context-size-tokens-temperature-shots-cot-prompts-gsm8k-mmlu-4bafa9566652
To find them it only required a web search using the query local llm parameters and number of params of cloud models on DuckDuckGo.
Edit: formatting
Appreciated. Very much appreciated!