Evidence for the DDoS attack that bigtech LLM scrapers actually are.

  • raoul@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    1 day ago

    The only simple possibles ways are:

    • robot.txt
    • rate limiting by ip
    • blocking by user agent

    From the article, they try to bypass all of them:

    They also don’t give a single flying fuck about robots.txt …

    If you try to rate-limit them, they’ll just switch to other IPs all the time. If you try to block them by User Agent string, they’ll just switch to a non-bot UA string (no, really). This is literally a DDoS on the entire internet.

    It then become a game of whac a mole with big tech 😓

    The more infuriating for me is that it’s done by the big names, and not some random startup. Edit: Now that I think about it, this doesn’t prove it is done by Google or Amazon: it can be someone using random popular user agents

    • jherazob@fedia.io
      link
      fedilink
      arrow-up
      5
      ·
      1 day ago

      I do believe there’s blocklists for their IPs out there, that should mitigate things a little