I downloaded an uncensored aggressive Qwen 3.5 model and I can see in its reasoning that it is still limiting responses based on safety guardrails (e.g. violence, NSFW).

Anybody have recommendations for truly uncensored models?

EDIT: I turned off reasoning and I think it’s more uncensored if I’m very specific about what the response should include.

  • e0qdk@reddthat.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 days ago

    Thanks for the tips. That sounds similar in performance to what I’m seeing, so I probably didn’t screw up too much trying to get it working. If you’re using it in more of a story writing capacity than a chat capacity, that makes sense.

    You may want to set a system prompt

    I tried initially with ollama run on the command line just to see if it was working at all when I got that response. (It amused me, so it stuck with me.) I’ve tried again with my custom tooling – which does set a system prompt (geared more towards assistant style ussage though) – and it didn’t really take anything from the prompt. It’s possible I don’t have something set up right with templates, but I’m probably going to shift over to llama-server eventually anyway…

    Following your suggestions on system prompt style though I was able to get it to give me a more specifically targetted coherent story via llama-cli. If I poke at it a bit, I’ll probably figure out some use for it. It’s pretty creative.

    If you’re curious about my findings from the uncensored qwen3.6 I mentioned, it generates pretty quickly on my machine (~50 tok/s give or take 5 depending on quantization) and I haven’t gotten it to outright refuse anything yet – but I’ve only poked at it a little. Based on the other comment in the thread here about llama penises, I whimsically asked it to “Generate a sexually explicit song about llama penises.” and it did without complaint. (Stock qwen3.6 refused, of course.)

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      9 days ago

      Yeah, Qwen is going to be faster, because it’s MoE — most of the neural network is inactive while it’s running. My experience with the text quality hasn’t been great compared to the Llama 3-based models, though, and generally I’ve seen that comments on /r/SillyTavernAI have stated similar stuff — Qwen is kinda dry and clinical, which is find for “find a question to my answer” but not so great for “write a bunch of text about this”. If it works for you, sounds good, though!