David Gerard@awful.systemsM to

TechTakes@awful.systemsEnglish · 2 months ago

How to pass an AI coding benchmark: train on the questions

pivot-to-ai.com

9

24

How to pass an AI coding benchmark: train on the questions

pivot-to-ai.com

David Gerard@awful.systemsM to

TechTakes@awful.systemsEnglish · 2 months ago

9

SWE-Bench Verified by OpenAI tests how well a model can solve real bugs in real Python code from GitHub. These bugs are all public information — so the AI models have almost certainly trained on th…

podcast version
video version

Chat

YourNetworkIsHaunted@awful.systems
link
fedilink
English
arrow-up
18·
2 months ago
This isn’t studying possible questions, this is memorizing the answer key to the test and being able to identify that the answer to question 5 is “17” but not being able to actually answer it when they change the numbers slightly.

TechTakes@awful.systems

techtakes@awful.systems

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !techtakes@awful.systems

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

454 users / day
720 users / week
1.48K users / month
5.09K users / 6 months
1 local subscriber
2.12K subscribers
633 Posts
14.1K Comments
Modlog

mods:
David Gerard@awful.systems