Aggressive AI scrapers are making it kinda suck to run wikis

lemmydividebyzero@reddthat.com · 7 days ago

Aggressive AI scrapers are making it kinda suck to run wikis

_deleted_@aussie.zone · 7 days ago

iocaine doesn’t stop them, but it uses minimal resources and makes me feel better about serving pages to them.

algernon@lemmy.ml · 7 days ago

It can stop them nowadays, by firewalling some of the crawlers off. The reason it doesn’t stop them by default is because it serves them poisoned URLs, which it can later identify if the crawlers come back riding a headless Chrome. But once they do that, and hit a poisoned URL, there’s little reason to let them wander in an endless maze further: serve one request, and block the IP.

I’ve been running that on my own infra, and my daily number of requests went down from ~50+ million to… 2 million.

undefinedTruth@lemmy.zip · 7 days ago

Never heard of it, but I see Anubis pretty widely adopted especially among open source projects.