Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
@stefano If you hear of anyone successfully foiling their crawlers (other than cloudflare), please pass the info along!
@davepolaschek For a demo of the pow-bot-deterrent-rp, see one of the source code files on that repo, like:
Still has some issues on privacy browsers which don't allow WebAssembly. Anubis doesn't have the same issue because it uses SHA256 via WebCrypto for now.
I've been working on an email-based alternative for browsers that don't do webworkers / web assembly but still in the proof of concept stage.
@forestjohnson Thanks! I still have source I’d like to publish one of these days, but it’s going to be living on a fossil server. On OpenBSD. So presumably I’ll have to read and understand @joel’s article first. https://www.tumfatig.net/2022/running-docker-host-openbsd-vmd/
Added to the reading list.
Er, sorry, to clarify, what I meant was that docker is not required; it's just the main config example I have right now. the JSON equivalent is:
https://git.sequentialread.com/forest/pow-bot-deterrent-rp/src/branch/main/config.json.example
altho it may be out of date w/ the latest changes.
@forestjohnson Or maybe the fossil / SQLite folks will implement a clever solution, and I’ll just update to the newest version and done.
This doesnt work because all the AI companies are paying rent to malware authors who trojan horse TCP proxies into tons of phone apps and desktop software.
So all the LLM scraping requests will come from the exact same residential IP address ASNs that your legit users are coming from.
See:
https://brightdata.com/proxy-types/residential-proxies
https://oxylabs.io/products/residential-proxy-pool
https://www.webshare.io/residential-proxy
https://iproyal.com/residential-proxies/
https://soax.com/proxies/residential
https://proxyempire.io/
huge industry rn
@forestjohnson @joel Ahh. Ok. I’d rather not have to use docker if at all possible.
But I’m also considering alternatives, such as blocking huge whacks of ipv4 space, and saying: “You wanna see this? VPN to somewhere that doesn’t harbor a gazillion AI scrapers.” Which when I think about it, it an entirely different proof-of-work.
Of course they’ll move to ipv6, and I’ll lose that arms-race in the end.