Most AI translation tools rely on cloud services.

Audio leaves your device, gets processed somewhere else, and comes back translated.

We wanted to explore a different approach.

PolyTalk is an open-source translation platform built around the idea that speech recognition, translation, and speech synthesis can be powered by open models and deployed on infrastructure you control.

The project combines open-source components for transcription, translation, and TTS into a privacy-first workflow.

Curious how others in the open-source AI community think about privacy and ownership when it comes to AI-powered communication tools.

GitHub: https://github.com/PolyTalkIO/polytalk

  • Sergio@piefed.social
    link
    fedilink
    English
    arrow-up
    3
    ·
    22 days ago

    lel I worked on a couple speech interface projects back in the 00s before all these corporate spyware platforms emerged. Naturally, it was all on-device (or a local server we controlled). This was more R&D/prototype stuff so it wasn’t as robust as systems nowadays, but the software is still out there:

    • Pbiz@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      16 days ago

      That’s really interesting. Sometimes it feels like local AI is a new idea, but a lot of the foundations were already there years ago.

      The difference now is that the models have become good enough that these kinds of workflows are practical for everyday users, not just research projects.

      • Sergio@piefed.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        16 days ago

        All AI was local until recently. (late 2010s maybe?) It’s important not to let the cloud providers gaslight us.

        these kinds of workflows are practical for everyday users

        Kind of. A good system will still have a lot of design to it. If you just take an off-the-shelf LLM and do the minimal tuning for it to do the job, then you’ll get just another crappy system.

        • Pbiz@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          15 days ago

          That’s a fair point. A good user experience usually comes from the engineering around the model, not just the model itself.

          The AI gets most of the attention, but things like latency, workflow design, context handling, and reliability often make the difference between something people try once and something they actually use.