Apple Machine Learning Research has a new preprint: “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity.” [Apple, PDF] “Lar…
It would have to be more than just river crossings, yeah.
Although I’m also dubious that their LLM is good enough for universal river crossing puzzle solving using a tool. It’s not that simple, the constraints have to be translated into the format that the tool understands, and the answer translated back. I got told that o3 solves my river crossing variant but the chat log they gave had incorrect code being run and then a correct answer magically appearing, so I think it wasn’t anything quite as general as that.
Latter test fails if they write a specific bit of code to put out the ‘llms fail the river crossing’ fire btw. Still a good test.
It would have to be more than just river crossings, yeah.
Although I’m also dubious that their LLM is good enough for universal river crossing puzzle solving using a tool. It’s not that simple, the constraints have to be translated into the format that the tool understands, and the answer translated back. I got told that o3 solves my river crossing variant but the chat log they gave had incorrect code being run and then a correct answer magically appearing, so I think it wasn’t anything quite as general as that.