Google's Gemini 2.5 pro is out of beta.

diz@awful.systems · edit-2 2 months ago

Google's Gemini 2.5 pro is out of beta.

scruiser@awful.systems · 2 months ago

Have they fixed it as in genuinely uses python completely reliably or “fixed” it, like they tweaked the prompt and now it use python 95% of the time instead of 50/50? I’m betting on the later.

aramova@infosec.pub · 2 months ago

Non-deterministic LLMs will always have randomness in their output. Best they can hope for is layers of sanity checke slowing things down and costing more.

scruiser@awful.systems · 2 months ago

If you wire the LLM directly into a proof-checker (like with AlphaGeometry) or evaluation function (like with AlphaEvolve) and the raw LLM outputs aren’t allowed to do anything on their own, you can get reliability. So you can hope for better, it just requires a narrow domain and a much more thorough approach than slapping some extra firm instructions in an unholy blend of markup languages in the prompt.

In this case, solving math problems is actually something Google search could previously do (before dumping AI into it) and Wolfram Alpha can do, so it really seems like Google should be able to offer a product that does math problems right. Of course, this solution would probably involve bypassing the LLM altogether through preprocessing and post processing.

Also, btw, LLM can be (technically speaking) deterministic if the heat is set all the way down, its just that this doesn’t actually improve their performance at math or anything else. And it would still be “random” in the sense that minor variations in the prompt or previous context can induce seemingly arbitrary changes in output.

diz@awful.systems · 2 months ago

Yeah, I’d also bet on the latter. They also added a fold-out button that shows you the code it wrote (folded by default), but you got to unfold it or notice that it is absent.