Google's Gemini 2.5 pro is out of beta.

diz@awful.systems · edit-2 7 months ago

Google's Gemini 2.5 pro is out of beta.

diz@awful.systems · 7 months ago

Thing is, it has tool integration. Half of the time it uses python to calculate it. If it uses a tool, that means it writes a string that isn’t shown to the user, which runs the tool, and tool results are appended to the stream.

What is curious is that instead of request for precision causing it to use the tool (or just any request to do math), and then presence of the tool tokens causing it to claim that a tool was used, the requests for precision cause it to claim that a tool was used, directly.

Also, all of it is highly unnatural texts, so it is either coming from fine tuning or from training data contamination.

bitofhope@awful.systems · 7 months ago

A tool uses an LLM, the LLM uses a tool. What a beautiful ouroboros.

HedyL@awful.systems · 7 months ago

Also, if the LLM had reasoning capabilities that even remotely resembled those of an actual human, let alone someone who would be able to replace office workers, wouldn’t they use the best tool they had available for every task (especially in a case as clear-cut as this)? After all, almost all humans (even children) would automatically reach for their pocket calculators here, I assume.

diz@awful.systems · 7 months ago

Well, it did reach for “I double checked it, I’m totally sure now” language.

From the perspective of trying to convince the top brass that they are making good progress towards creating an artificial psychopath - not just an artificial human - it’s pretty good.