

Hey, you’re selling them short: there are also ReLU and softmax activation functions thrown around here and there. Clankers aren’t just linear transformations! /j


Hey, you’re selling them short: there are also ReLU and softmax activation functions thrown around here and there. Clankers aren’t just linear transformations! /j


I am a computer science PhD so I can give some opinion on exactly what is being solved.
First of all, the problem is very contrived. I cannot think of what the motivation or significance of this problem is, and Knuth literally says that it is a planned homework exercise. It’s not a problem that many people have thought about before.
Second, I think this problem is easy (by research standards). The problem is of the form: “Within this object X of size m, find any example of Y.” The problem is very limited (the only thing that varies is how large m is), and you only need to find one example of Y for each m, even if there are many such examples. In fact, Filip found that for small values of m, there were tons of examples for Y. In this scenario, my strategy would be “random bullshit go”: there are likely so many ways to solve the problem that a good idea is literally just trying stuff and seeing what sticks. Knuth did say the problem was open for several weeks, but:
I guess “random bullshit go” is served well by a random bullshit machine, but you still need an expert who actually understands the problem to read the tea leaves and evaluate if you got something useful. Knuth’s narrative is not very transparent about how much Filip handheld for the AI as well.
I think the main danger of this (putting aside the severe societal costs of AI) is not that doing this is faster or slower than just thinking through the problem yourself. It’s that relying on AI atrophies your ability to think, and eventually even your ability to guard against the AI bullshitting you. The only way to retain a deep understanding is to constantly be in the weeds thinking things through. We’ve seen this story play out in software before.


I was pissed when my (non-academic) friends saw this and immediately started talking about how mathematicians and computer scientists need to use AI from now on.


scott jumpscare


Baldur Bjarnason’s essay remains evergreen.
Consider homeopathy. You might hear a friend talk about “water memory”, citing all sorts of scientific-sounding evidence. So, the next time you have a cold you try it.
And you feel better. It even feels like you got better faster, although you can’t prove it because you generally don’t document these things down to the hour.
“Maybe there is something to it.”
Something seemingly working is not evidence of it working.
Were you doing something else at the time which might have helped your body fight the cold?
Would your recovery have been any different had you not taken the homeopathic “remedy”?
Did your choosing of homeopathy over established medicine expose you to risks you weren’t aware of?
Even when looking at Knuth’s account of what happened, you can already tell that the AI is receiving far more credit than what it actually did. There is something about a nondeterministic slot machine that makes it feel far more miraculous when it succeeds, while reliable tools that always do their job are boring and stupid. The downsides of the slot machine never register in comparison to the rewards.
I feel like math research is particularly susceptible to this, because it is the default that almost all of one’s attempts do not succeed. So what if most of the AI’s attempts do not succeed? But if it is to be evaluated as a tool, we have to check if the benefits outweigh the costs. Did it give me more productive ideas, or did it actually waste more of my time leading me down blind alleys? More importantly, is the cognitive decline caused by relying on slot machines going to destroy my progress in the long term? I don’t think anyone is going to do proper experiments for this in math research, but we have already seen this story play out in software. So many people were impressed by superficial performances, and now we are seeing the dumpster fire of bloat, bugs, and security holes. No, I don’t think I want that.
And then there is the narrative of not evaluating AI as an objective tool based on what it can actually do, but instead as a tidal wave of Unending Progress that will one day sweep away those elitists with actual skills. This is where the AI hype comes from, and why people avoid, say, comparing AI with Mathematica. To them I say good luck. We have dumped hundreds of billions of dollars into this, and there are only so many more hundreds of billions of dollars left. Were these small positive results (and significant negatives) worth hundreds of billions of dollars, or perhaps were there better things that these resources could have been used for?


Don’t worry, there’s always Effective Altruism if you ever feel guilty about causing the suffering of regular people. Just say you’re going to donate your money at some point eventually in the future. There you go, 40 trillion hypothetical lives saved!


This somehow makes things even funnier. If he had any understanding of modern math, he would know that representing a set of things as points in some geometric space is one of the most common techniques in math. (A basic example: a pair of numbers can be represented by a point in 2D space.) Also, a manifold is an extremely broad geometric concept: knowing that two things are manifolds does not meant that they are the same or even remotely similar, without checking the details. There are tons of things you can model as a manifold if you try hard enough.
From what I see, Scoot read a paper modeling LLM inference with manifolds and thought “wow, cool!” Then he fished for neuroscience papers until he found one that modeled neurons using manifolds. Both of the papers have blah blah blah something something manifolds so there must be a deep connection!
(Maybe there is a deep connection! But the burden of proof is on him, and he needs to do a little more work than noticing that both papers use the word manifold.)


Kolmogorov complexity:
So we should see some proper definitions and basic results on the Kolmogorov complexity, like in modern papers, right? We should at least see a Kt or a pKt thrown in there, right?
Understanding IS compression — extracting structure from data. Optimal compression is uncomputable. Understanding is therefore always provisional, always improvable, never verifiably complete. This kills “stochastic parrot” from a second independent direction: if LLMs were memorizing rather than understanding, they could not generalize to inputs not in their training data. But they do. Generalization to novel input IS compression — extracting structure, not regurgitating sequences.
Fuck!


Nonsensical analogies are always improved by adding a chart with colorful boxes and arrows going between them. Of course, the burden of proof is on you, dear reader, to explain why the analogy doesn’t make sense, not on the author to provide more justification than waving his hands really really hard.
Many of these analogies are bad as, I don’t know, “Denmark and North Korea are the same because they both have governments” or something. Humans and LLMs both produce sequences of words, where the next word depends in some way on the previous words, so they are basically the same (and you can call this “predicting” the next word as a rhetorical flourish). Yeah, what a revolutionary concept, knowing that both humans and LLMs follow the laws of time and causality. And as we know, evolution “optimizes” for reproduction, and that’s why there are only bacteria around (they can reproduce every 20 minutes). He has to be careful, these types of dumbass “optimization” interpretations of evolution that arose in the late 1800s led to horrible ideas about race science … wait a minute …
He isn’t even trying with the yellow and orange boxes. What the fuck do “high-D toroidal attractor manifolds” and “6D helical manifolds” have to do with anything? Why are they there? And he really thinks he can get away with nobody closely reading his charts, with the “(???, nothing)” business. Maybe I should throw in that box in my publications and see how that goes.
I feel like his arguments rely on the Barnum effect. He makes statements like “humans and LLMs predict the next word” and “evolution optimizes for reproduction” that are so vague that they can be assigned whatever meaning he wants. Because of this, you can’t easily dispel them (he just comes up with some different interpretation), and he can use them as carte blanche to justify whatever he wants.


Maybe I should apply to be a director of AI safety at Meta. I only know one safety measure that works: don’t use AI.


What’s next, are the crypto bros gonna make some dumb talking point about how traditional finance also uses so much energy … oh wait, they already did that.


For all the talk about these people being “highly agentic”, it is deeply ironic how all the shit they do has no meaning and purpose. I hear all this sound and fury about making millions off of ChatGPT wrappers, meeting senators in high school bathrooms, and sperm races (?), and I wonder what the point is. Silicon Valley hagiographies used to at least have a veneer that all of this was meaningful. Are we supposed to emulate anyone just because they happen to temporarily have a few million dollars?
Even though the material conditions of working in science are not good, I’d still rather do science than whatever the hell they’re doing. I would be sick at the prospect of being a “highly agentic” person in a “new and possibly permanent overclass”, where my only sense of direction is a vague voice in my head telling me that I should be optimizing my life in various random ways, and my only motivation is the belief that I have to win harder and score more points on the leaderboard. (In any case, I believe this “overclass” is a lot more fragile than the author seems to think.)


At first I read the article like the author was trying to display how ridiculous these people are by just repeating what they say. I guess this is like some people reading Ayn Rand works under the impression that they’re satire.


This was not such an effective venture.


my current favorite trick for reducing “cognitive debt” (h/t @simonw ) is to ask the LLM to write two versions of the plan:
- The version for it (highly technical and detailed)
- The version for me (an entertaining essay designed to build my intuition)
I don’t know about them, but I would be offended if I was planning something with a collaborator, and they decide to give me a dumbed down, entertaining, children’s storybook version of their plan while keeping all the technical details to themselves.
Also, this is absolutely not what “cognitive debt” means. I’ve heard technical debt refers to bad design decisions in software where one does something cheap and easy now but has to constantly deal with the maintenance headaches afterwards. But the very concept of working through technical details? That’s what we call “thinking”. These people want to avoid the burden of thinking.


This is why CCC being able to compile real C code at all is noteworthy. But it also explains why the output quality is far from what GCC produces. Building a compiler that parses C correctly is one thing. Building one that produces fast and efficient machine code is a completely different challenge.
Every single one of these failures is waved away because supposedly it’s impressive that the AI can do this at all. Do they not realize the obvious problem with this argument? The AI has been trained on all the source code that Anthropic could get their grubby hands on! This includes GCC and clang and everything remotely resembling a C compiler! If I took every C compiler in existence, shoved them in a blender, and spent $20k on electricity blending them until the resulting slurry passed my test cases, should I be surprised or impressed that I got a shitty C compiler? If an actual person wrote this code, they would be justifiably mocked (or they’re a student trying to learn by doing, and LLMs do not learn by doing). But AI gets a free pass because it’s impressive that the slop can come in larger quantities now, I guess. These Models Will Improve. These Issues Will Get Fixed.


Congratulations to the maker of a tool that charges you $20 to remind you to buy milk the next morning.


I thought I was sticking my neck out when I said that OpenAI was faking their claims in math, such as with the whole International Math Olympiad gold medal incident. Even many of my peers in my field are starting to become receptive to all of these rumors about how AI is supposedly getting good at math. Sometimes I wonder if I’m going crazy and sticking my head in the sand.
All I can really do is to remember that AI developers are bad faith (and scientists are actually bad at dealing with bad faith tactics like flooding the zone with bullshit). If the boy has cried wolf 10 times already, pardon me if I just ignore him entirely when he does it for the 11th time.
I would not underestimate how much OpenAI and friends would go out of their way to cheat on math benchmarks. In the techbro sphere, math is placed on a pedestal to the point where Math = Intelligence.


It took a full eleven paragraphs before the article even mentions AI. Before that, it was a bunch of stuff about how Wikipedia is conservative and Gen Z and Gen Alpha have no attention span. If the author has to bury the real point and attempt to force this particular rhetorical framing, I think the haters are winning. Well done everyone.
These three controversies from Wikipedia’s past reveal how genuine conversations can achieve—after disagreements and controversy—compromise and evolution of Wikipedia’s features and formats. Reflexive vetoes of new experiments, as the Simple Summaries spat highlighted last summer, is not genuine conversation.
Supplementing Wikipedia’s Encyclopedia Britannica–style format with a small component that contains AI summaries is not a simple problem with a cut-and-dried answer, though neither were VisualEditor or Media Viewer.
Surely, AI summaries are exactly the same as stuff like VisualEditor and Media Viewer, which were tools that helped contributors improve articles. Please ignore my rhetorical sleight of hand. They’re exactly the same! Okay, I did mention AI hallucinations in one sentence, but let’s move on from that real quick.
A still deeper crisis haunts the online encyclopedia: the sustainability of unpaid labor. Wikipedia was built by volunteers who found meaning in collective knowledge creation. That model worked brilliantly when a generation of internet enthusiasts had time, energy, and idealism to spare. But the volunteer base is aging. A 2010 study found the average Wikipedia contributor was in their mid-twenties; today, many of those same editors are now in their forties or fifties.
Yeah, because Wikipedia editors are permanently static. Back in 2001, Jimmy Wales handpicked a bunch of teenagers to have the sacred title of Wikipedia Editor, and they are the only ones who will ever be allowed to edit Wikipedia. Oh wait, it doesn’t work like that. Older people retire and move on, and new people join all the time.
Meanwhile, the tech industry has discovered how to extract billions in value from their work. AI companies train their large language models on Wikipedia’s corpus. The Wikimedia Foundation recently noted it remains one of the highest-quality datasets in the world for AI development. Research confirms that when developers try to omit Wikipedia from training data, their models produce answers that are less accurate, less diverse, and less verifiable.
Now that we have all these golden eggs, who needs the goose anymore? Actually, it is Inevitable that the goose must be killed. It is progress. It is the advancement of technology. We just have to accept it.
The irony is stark. AI systems deliver answers derived from Wikipedia without sending users back to the source. Google’s AI Overviews, ChatGPT, and countless other tools have learned from Wikipedia’s volunteer-created content—then present that knowledge in ways that break the virtuous cycle Wikipedia depends on. Fewer readers visit the encyclopedia directly. Fewer visitors become editors. Fewer users donate. The pipeline that sustained Wikipedia for a quarter century is breaking down.
So AI is a parasite that takes from Wikipedia, contributes nothing in return, and in fact actively chokes it out? And you think the solution is for Wikipedia to just surrender and implement AI features? Do you keep forgetting what point you’re trying to make?
Meanwhile, AI systems should credit Wikipedia when drawing on its content, maintaining the transparency that builds public trust. Companies profiting from Wikipedia’s corpus should pay for access through legitimate channels like Wikimedia Enterprise, rather than scraping servers or relying on data dumps that strain infrastructure without contributing to maintenance.
Yeah, what a wonderful suggestion. The AI companies just never realized all this time that they could use legitimate channels and give back to the sources they use. It’s not like they are choosing to do this because they have no ethics and want the number to go up no matter the costs to themselves or to others.
Wikipedia has survived edit wars, vandalism campaigns, and countless predictions of its demise. It has patiently outlived the skeptics who dismissed it as unreliable. It has proven that strangers can collaborate to build something remarkable.
Wikipedia has survived countless predictions of its demise, but I’m sure this prediction of its demise is going to pan out. After all, AI is more important than electricity, probably.
The AI people are still infatuated with math. The Epoch AI staff, after being thoroughly embarrassed last year by the FrontierMath scandal, have now decided to make a new FrontierMath Open Problems benchmark, this time with problems that people might give a shit about!
I decided to look at one of the easiest “moderately interesting” problems and noticed that GPT-5.2 Pro managed to solve a warm up version of the problem, i.e. a version that had been previously solved. Wow, these reasoning models sure are capable of math! So I was curious and looked at the reasoning trace and it turns out that … the model just found an obscure website with the right answer and downloaded it. Well, I guess you could say it has some impressive reasoning as it figures out how to download and parse the data, maybe.