LLM training bots are a plague

mapto · 1 year ago

LLM training bots are a plague

pcouy@lemmy.pierre-couy.fr · 1 year ago

CIDR ranges (a.b.c.d/subnet_mask) contain 2^(32-subnet_mask) IP addresses. The 1.5 I’m using controls the filter’s sensitivity and can be tuned to anything between 1 and 2

Using 1 or smaller would mean that the filter gets triggered earlier for larger ranges (we want to avoid this so that a single IP can’t trick you into banning a /16)

Using 2 or more would mean you tolerate more fail/IP for larger ranges, making you ban all smaller subranges before the filter gets a chance to trigger on a larger range.

This is running locally to a single f2b instance, but should work pretty much the same with aggregated logs from multiple instances

froztbyte@awful.systems · 1 year ago

I’m aware of the construction of a CIDR prefix, I meant what are you using to categorise IPs from requests to look up mask size? whois? using published NIC/RIR data? what’s in BGP/routedumps? other?

hecko@pawb.social · 1 year ago

late but i believe they mean they check for every possible range, e.g. if it’s only 1.2.3.5 making noise it’ll get banned as a /32 but if 1.2.3.6 is too it might justify a /30

LLM training bots are a plague

LLM training bots are a plague

Excerpt from a message I just posted in a #diaspora team internal f...