Stubsack: weekly thread for sneers not worth an entire post, week ending 19th April 2026

BlueMonday1984@awful.systems · 2 months ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 19th April 2026

samvines@awful.systems · edit-2 2 months ago

There is just something so inherently smug and annoying about Mollick. He is one of those low information boosters whose posts sound intellectual until you really think about them.

Tell me more about how the pile of cursed spaghetti that is Claude code is now viable due to model breakthroughs. All I see are hype men saying “the new model is a team of PhDs in your pocket” and then releasing disappointing updates or saying “the new model is too dangerous” because they have some vaporware powered by human crowdsourcing.

Also coding is not like other areas - you can test for hallucinations by compiling and printing and running tests.

I guess my first mistake this morning was opening linkedin

o7___o7@awful.systems · 2 months ago

“Cursed spaghetti”

🤌

YourNetworkIsHaunted@awful.systems · 2 months ago

I’ve never understood how these things are simultaneously gaining their abilities based on statistical analysis of all kinds of random writings online including social media, fanfic, reddit, etc. but also are simultaneously supposed to end up as experts rather than a much faster and more agreeable dumbass. Like, the training data may include all the great works of literature, all the scrapable scientific studies and textbooks they could steal, and so on. But it also included every moron who ever shared conspiracy theories on Twitter, every confident-sounding business idiot on LinkedIn, and every stupid word that Scott or Yud ever wrote. Surely the bullshit has to exceed the expertise by raw volume, and if they took the time and energy to curate it out the way they would need to to correct that they wouldn’t be left with a large enough sample to actually scale off of.

Basically, either I’m dramatically misunderstanding something or the best we can hope for is the Average Joe on Reddit, who may not be a complete dumbass but definitely isn’t a team of PhDs.

scruiser@awful.systems · edit-2 2 months ago

LLMs generate the next most probable token given the previous context of tokens they have (not an average of the entire internet). And post-training shifts the odds a bit further in a relatively useful direction. So given the right context the LLM will mostly consistently regurgitate content stolen from PhDs and academic papers, maybe even managing to shuffle it around in a novel way that is marginally useful.

Of course, that is only the general trend given the right^tm prompt. Even with a prompt that looks mostly right, one seemingly innocuous word in the wrong place might nudge the odds and you get the answer of a moron /r/hypotheticalphysics in response to a physics question. Or a asking for a recipe gets you elmer’s glue on your mozarella pizza from a reddit joke answer.

if they took the time and energy to curate it out the way they would need to to correct that they wouldn’t be left with a large enough sample to actually scale off of

They do steps like train the model generally on the desired languages with all the random internet bullshit, and then fine-tuning it on the actually curated stuff. So that shifts the odds, but again, not enough to actually guarantee anything.

So tldr; you’re right, but since it is possible to get somewhat better than average internet junk with pre-training and prompting, llm boosters and labs have convinced themselves they are just a few more iterations of training approaches and prompting techniques away from entirely eliminating the problem, when the best they can do is make it less likely.

Stubsack: weekly thread for sneers not worth an entire post, week ending 19th April 2026

Stubsack: weekly thread for sneers not worth an entire post, week ending 19th April 2026

Stubsack: weekly thread for sneers not worth an entire post, week ending 12th April 2026 - awful.systems