@SGforce

SGforce@lemmy.ca · 2 months ago

Huge gap there on that second graph. Hiding a massive spike, are we?

SGforce@lemmy.ca · 3 months ago

How’s it going, eh?

SGforce@lemmy.ca · 6 months ago

I’m going to make the crookedest arrows, guaranteed.

SGforce@lemmy.ca · 7 months ago

Exactly. It’s overtrained on the test, ignoring the differences. If you instead used something it recognises but doesn’t recognise as the test pattern (having the same tokens/embeddings) it will perform better. I’m not joking, it’s a common tactic to get around censoring. You’re just going around the issue. What I’m saying is they’ve trained the model so much on benchmarks that it is indeed dumber.

SGforce@lemmy.ca · 7 months ago

SGforce@lemmy.ca · 7 months ago

It’s just overtrained on the puzzle such that it mostly ignores your prompt. Changing a few words out doesn’t change that it recognises the puzzle. Try writing it out in ASCII or uploading an image with it written or some other weird way that it hasn’t been specifically trained on and I bet it actually performs better.

SGforce@lemmy.ca · 8 months ago

They finetuned 1.5-3b models. This is a non-story

SGforce@lemmy.ca · 9 months ago

The local models are distilled versions of Qwen or llama or whatever else, not really deepseek’s model. So you get refusals based on the base model primarily, plus whatever it learned from the distilling. If it’s Qwen or another Chinese model then it’s more likely to refuse but a llama model or something else could pick it up to a lesser extent.

SGforce@lemmy.ca · 9 months ago

When hedge funds decide to flip the switch on something the reaction never looks rational. Meta was green today ffs.

SGforce@lemmy.ca · 9 months ago

Yep, was going to mention a study from a few years ago that threw out most of this data as junk since they found many were counting stretched flaccid as “erect”.