

7·
1 month agoThe local models are distilled versions of Qwen or llama or whatever else, not really deepseek’s model. So you get refusals based on the base model primarily, plus whatever it learned from the distilling. If it’s Qwen or another Chinese model then it’s more likely to refuse but a llama model or something else could pick it up to a lesser extent.
They finetuned 1.5-3b models. This is a non-story