Stubsack: weekly thread for sneers not worth an entire post, week ending 8th February 2026

BlueMonday1984@awful.systems · edit-2 1 month ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 8th February 2026

lagrangeinterpolator@awful.systems · edit-2 28 days ago

I wonder what actual experts in compilers think of this. There were some similar claims about vibe coding a browser from scratch that turned out to be a little overheated: https://pivot-to-ai.com/2026/01/27/cursor-lies-about-vibe-coding-a-web-browser-with-ai/

I do not believe that this demonstrates anything other than they kept making the AI brute force random shit until it happened to pass all the test cases. The only innovation was that they spent even more money than before. Also, it certainly doesn’t help that GCC is open source, and they have almost certainly trained the model on the GCC source code (which the model can regurgitate poorly into Rust). Hell, even their blog post talks about how half their shit doesn’t work and just calls GCC instead!

It lacks the 16-bit x86 compiler that is necessary to boot Linux out of real mode. For this, it calls out to GCC (the x86_32 and x86_64 compilers are its own).

It does not have its own assembler and linker; these are the very last bits that Claude started automating and are still somewhat buggy. The demo video was produced with a GCC assembler and linker.

I wonder why this blog post was brazen enough to talk about these problems. Perhaps by throwing in a little humility, they can make the hype pill that much easier to swallow.

Sidenote: Rust seems to be the language of choice for a lot of these vibe coded “projects”, perhaps because they don’t want people immediately accusing them of plagiarism. But Rust syntax still reasonably follows languages like C. In most cases, blindly translating C code into Rust kinda works. Now, Rust does have the borrow checker which requires a lot of thinking to deal with, but I think this is not actually a disadvantage for the AI. Borrow checking is enforced by the compiler, so if you screw up in that department, your code won’t even compile. This is great for an AI that is just brute forcing random shit until it “works”.

V0ldek@awful.systems · edit-2 27 days ago

I wonder what actual experts in compilers think of this.

Anthropic doesn’t pay me and I’m not going to look over their pile of garbage for free, but just looking at the structure and READMEs it looks like a reasonable submission for an advanced student in a compiler’s course: lowering to IR, SSA representation, dominators, phi elimination, some passes like strength reduction. The register allocator is very bad though, I’d expect at least something based on colouring.

The READMEs are also really annoying to read. They are overlong and they don’t really explain what is going on in the module. There’s no high-level overview of the architecture of the compiler. A lot of it is just redundant. Like, what is this:

Ye dude, of course it doesn’t depend on the IR, because this is before IR is constructed. Are you just pretending to know how a compiler works? Wait, right, you are, you’re a bot. The last sentence is also hilarious, my brother in christ, what, why is this in the README.

Now this evaluation only makes sense if the compiler actually works - which it doesn’t. Looking at the filed issues there are glaring disqualifying problems (#177, #172, #171, #167, etc. etc. etc.). Like, those are not “oops, forgot something”, those are “the code responsible for this is broken”. Some of them look truly baffling, like how do you manage to get so many issues of the type “silently does something unexpected on error” when the code is IN RUST, which is explicitly designed to make those errors as hard as possible? Like I’m sorry, but the ones below? These are just “you did not even attempt to fulfill the assignment”.

It’s also not tested, it has no integration tests (even though the README says it does), which is plain unacceptable. And the unit tests that are there fail so lol, lmao.

It’s worse than existing industry compilers and it doesn’t offer anything interesting in terms of the implementation. If you’re introducing your own IR and passes you have to have a good enough reason to not just target LLVM. Cranelift is… not great, but they at least have interesting design choices and offer quick unoptimized compilation. This? The only reason you’d write this is you were indeed a student learning compilers, in which case it’d be a very good experience. You’d probably learn why testing is important for the rest of your life at least.

corbin@awful.systems · 28 days ago

I only sampled some of the docs and interesting-sounding modules. I did not carefully read anything.

First, the user-facing structure. The compiler is far too configurable; it has lots of options that surely haven’t been tested in combination. The idea of a pipeline is enticing but it’s not actually user-programmable. File headers are guessed using a combination of magic numbers and file extensions. The dog is wagged in the design decisions, which might be fair; anybody writing a new C compiler has to contend with old C code.

Next, I cannot state enough how generated the internals are. Every hunk of code tastes bland; even when it does things correctly and in a way which resembles a healthy style, the intent seems to be lacking. At best, I might say that the intent is cargo-culted from existing code without a deeper theory; more on that in a moment. Consider these two hunks. The first is generated code from my fork of META II:

while i < len(self.s) and self.clsWhitespace(ord(self.s[i])): i += 1

And the second is generated code from their C compiler:

while self.pos < self.input.len() && self.input[self.pos].is_ascii_whitespace() {
    self.pos += 1;
}

In general, the lexer looks generated, but in all seriousness, lexers might be too simple to fuck up relative to our collective understanding of what they do. There’s also a lot of code which is block-copied from one place to another within a single file, in lists of options or lists of identifiers or lists of operators, and Transformers are known to be good at that sort of copying.

The backend’s layering is really bad. There’s too much optimization during lowering and assembly. Additionally, there’s not enough optimization in the high-level IR. The result is enormous amounts of spaghetti. There’s a standard algorithm for new backends, NOLTIS, which is based on building mosaics from a collection of low-level tiles; there’s no indication that the assembler uses it.

The biggest issue is that the codebase is big. The second-biggest issue is that it doesn’t have a Naur-style theory underlying it. A Naur theory is how humans conceptualize the codebase. We care about not only what it does but why it does. The docs are reasonably-accurate descriptions of what’s in each Rust module, as if they were documents to summarize, but struggle to show why certain algorithms were chosen.

Choice sneer, credit to the late Jessica Walter for the intended reading: It’s one topological sort, implemented here. What could it cost? Ten lines?

I do not believe that this demonstrates anything other than they kept making the AI brute force random shit until it happened to pass all the test cases.

That’s the secret: any generative tool which adapts to feedback can do that. Previously, on Lobsters, I linked to a 2006/2007 paper which I’ve used for generating code; it directly uses a random number generator to make programs and also disassembles programs into gene-like snippets which can be recombined with a genetic algorithm. The LLM is a distraction and people only prefer it for the ELIZA Effect; they want that explanation and Naur-style theorizing.

V0ldek@awful.systems · 27 days ago

There’s a standard algorithm for new backends, NOLTIS

I think this makes it sound more cutting-edge and thus less scathing than it should, it’s an algorithm from 2008 and is used by LLVM. Claude not only trained on the paper but on all of LLVM as well.

o7___o7@awful.systems · 28 days ago

This could be it’s own post. Very nice!

V0ldek@awful.systems · 27 days ago

It’s one topological sort, implemented here. What could it cost? Ten lines?

This one idk, some of it could be more concise but it also has to build the graph first using that weird seemingly custom hashmap as the source. This function, however, is immensely funny

YourNetworkIsHaunted@awful.systems · 28 days ago

I wonder if this is going to hold out long enough to get some obnoxious AI-first language created that is designed to have as obnoxiously picky of a compiler as it can in order to try and turn runtime errors that the model can’t cope with into compile failures which it can silently retry until they’re ‘fixed’

rook@awful.systems · 28 days ago

I wonder why this blog post was brazen enough to talk about these problems. Perhaps by throwing in a little humility, they can make the hype pill that much easier to swallow.

I feel this is an artefact of the near complete collapse of mainstream journalism, combined with modern tech business practises that are about securing investment and cashing out, and every other concern is secondary or even entirely absent. It’s all just selling vibes.

People only ever report the hype, the investors see everyone else following the hype and panic that they might be left out and bury you in cash. When it all turns sour and people ask pointed questions about the exact nature of the magic beans you were promising to grow, you can just point at the blog post that no-one read (or at least, only poor people read, and they’re barely people if you think about it) and point out that you never hid anything.

lagrangeinterpolator@awful.systems · edit-2 27 days ago

I don’t even think many AI developers realize that we’re in a hype bubble. From what I see, they genuinely believe that the Models Will Improve and that These Issues Will Get Fixed. (I see a lot of faculty in my department who still have these beliefs.)

What these people do see, however, are a lot of haters who just cannot accept this wonderful new technology for some reason. AI is so magical that they don’t need to listen to the criticisms; surely they’re trivial by comparison to magic, and whatever they are, These Issues Will Get Fixed. But lately they have realized that with the constant embarrassing AI failures (AI surely doesn’t have horrible ethics as well), there are a lot of haters who will swarm the announcement of any AI project now. The haters also tend to be people who actually know stuff and check things (tech journalists are incentivized to not do that), but it doesn’t matter because they’re just random internet commenters, not big news outlets.

My theory is that now they add a ton of caveats and disclaimers to their announcements in a vain attempt to reduce the backlash. Also if you criticize them, it’s actually your fault that it doesn’t work. It’s Still Early Days. These Issues Will Get Fixed.

V0ldek@awful.systems · 27 days ago

We’re Still Early never dies

Architeuthis@awful.systems · 27 days ago

the Models Will Improve

I tell people that this is code for RAM and storage will cost 10x by this time next year when this comes up. Highly recommended.