• ViatorOmnium@piefed.social
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    1 month ago

    Especially when two of the named languages (German and French) are around 20th in L1 speakers.

    I’m also interested in knowing how they decide what language a URL is in when lots of languages share words, even more so when you remove diacritics like it’s common in URIs. For example, is something like https://example.org/noticia/n-12345.html a Portuguese or Spanish URL?

    • emb@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      1 month ago

      I wonder that too. How to separate cross-language homonyms and nonsense words in URLs?

      For any individual page, I guess you base it on the page content if the URL language is ambiguous. Like anything with language, feels like it’d be fuzzy and hard to determine.

      Not that I necessarily doubt someone has collected the data, just not sure how internet statistics are figured out.