Apple: ‘Reasoning’ AIs fail hard if they actually have to think

David Gerard@awful.systems · 3 months ago

Apple: ‘Reasoning’ AIs fail hard if they actually have to think

diz@awful.systems · 3 months ago

I’d just write the list then assign randomly. Or perhaps pseudorandomly like sort by hash and then split in two.

One problem is that it is hard to come up with 20 or more completely unrelated puzzles.

Although I don’t think we need a large number for statistical significance here, if it’s like 8/10 solved in the cheating set and 2/10 in the hold back set.