

You’re comparing apples and oranges. Qwen3.5 27B is a dense model. Gemma4 26B is a mixture of experts model with 4B parameters activated at once. The equivalent would be Gemma4 31B, which is the Gemma4 dense model.
Both dense models are EXTREMELY good. From my testing, they can code and work agenticly with similar performance as a cutting edge model (Gemini, ChatGPT, Claude Sonnet) from 4-5 months before their release. Usually, Gemma models are better at prose and the Qwen model has scores a little bit better on coding and logic tests. These models being dense, they require more computation and memory bandwidth than mixture of experts (moe) models, which means they’re slower or more expensive to run.
If you purely are comparing the models you originally listed, the Qwen model will crush Gemma4 26B but it will run at a quarter of the speed. :)
That’s a cool write up! I suspect a lot of this will have to do with quantization, how much quality is getting lost, and how each model behaves under different quants.