Most AI translation tools rely on cloud services.

Audio leaves your device, gets processed somewhere else, and comes back translated.

We wanted to explore a different approach.

PolyTalk is an open-source translation platform built around the idea that speech recognition, translation, and speech synthesis can be powered by open models and deployed on infrastructure you control.

The project combines open-source components for transcription, translation, and TTS into a privacy-first workflow.

Curious how others in the open-source AI community think about privacy and ownership when it comes to AI-powered communication tools.

GitHub: https://github.com/PolyTalkIO/polytalk

  • Sergio@piefed.social
    link
    fedilink
    English
    arrow-up
    1
    ·
    17 days ago

    All AI was local until recently. (late 2010s maybe?) It’s important not to let the cloud providers gaslight us.

    these kinds of workflows are practical for everyday users

    Kind of. A good system will still have a lot of design to it. If you just take an off-the-shelf LLM and do the minimal tuning for it to do the job, then you’ll get just another crappy system.

    • Pbiz@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      17 days ago

      That’s a fair point. A good user experience usually comes from the engineering around the model, not just the model itself.

      The AI gets most of the attention, but things like latency, workflow design, context handling, and reliability often make the difference between something people try once and something they actually use.