

Even taking their story at face value:
-
It seems like they are hyping up LLM agents operating a bunch of scripts?
-
It indicates that their safety measures don’t work
-
Anthropic will read your logs, so you don’t have any privacy or confidentiality or security using their LLM, but, they will only find any problems months after the fact (this happened in June according to Anthropic but they didn’t catch it until September),
If it’s a Chinese state actor … why are they using Claude Code? Why not Chinese chatbots like DeepSeek or Qwen? Those chatbots code just about as well as Claude. Anthropic do not address this really obvious question.
- Exactly. There are also a bunch of open source models hackers could use for a marginal (if any) tradeoff in performance, with the benefit that they could run locally, so that their entire effort isn’t dependent on hardware outsider of their control in the hands of someone that will shut them down if they check the logs.
You are not going to get a chatbot to reliably automate a long attack chain.
- I don’t actually find it that implausible someone managed to direct a bunch of scripts with an LLM? It won’t be reliable, but if you can do a much greater volume of attacks maybe that makes up for the unreliability?
But yeah, the whole thing might be BS or at least bad exaggeration from Anthropic, they don’t really precisely list what their sources and evidence are vs. what is inference (guesses) from that evidence. For instance, if a hacker tried to setup hacking LLM bots, and they mostly failed and wasted API calls and hallucinated a bunch of shit, if Anthropic just read the logs from their end and didn’t do the legwork contacting people who had allegedly been hacked, they might "mistakenly’ (a mistake that just so happens to hype up their product) think the logs represent successful hacks.
Continuation of the lesswrong drama I posted about recently:
https://www.lesswrong.com/posts/HbkNAyAoa4gCnuzwa/wei-dai-s-shortform?commentId=nMaWdu727wh8ukGms
Did you know that post authors can moderate their own comments section? Someone disagreeing with you too much but getting upvoted? You can ban them from your responding to your post (but not block them entirely???)! And, the cherry on top of this questionable moderation “feature”, guess why it was implemented? Eliezer Yudkowsky was mad about highly upvoted comments responding to his post that he felt didn’t get him or didn’t deserve that, so instead of asking moderators to block on a case-by-case basis (or, acasual God forbid, consider maybe if the communication problem was on his end), he asked for a modification to the lesswrong forums to enable authors to ban people (and delete the offending replies!!!) from their posts! It’s such a bizarre forum moderation choice, but I guess habryka knew who the real leader is and had it implemented.
Eliezer himself is called to weigh in:
Uh, considering his recent twitter post… this sure is something. Also" “it does not feel like the system is set up to make that seem like a sympathetic decision to the audience” no shit sherlock, deleting a highly upvoted reply because it feels like too much effort to respond to is in fact going to make people unsympathetic (at the least).