@stefano If you hear of anyone successfully foiling their crawlers (other than cloudflare), please pass the info along!

@davepolaschek For a demo of the pow-bot-deterrent-rp, see one of the source code files on that repo, like:

git.sequentialread.com/forest/

Still has some issues on privacy browsers which don't allow WebAssembly. Anubis doesn't have the same issue because it uses SHA256 via WebCrypto for now.

I've been working on an email-based alternative for browsers that don't do webworkers / web assembly but still in the proof of concept stage.

@forestjohnson Thanks! I still have source I’d like to publish one of these days, but it’s going to be living on a fossil server. On OpenBSD. So presumably I’ll have to read and understand @joel’s article first. tumfatig.net/2022/running-dock

Added to the reading list.

Follow

@davepolaschek @joel

Sorry about the docker-only config example. It also supports a json file for config. You just need to be able to compile the go code for OpenBSD; that shouldn't be too hard.

I bet Anubis might be easier to use for now as its more popular and mine is more of a work-in-progress / hack. I made my own because I wanted this as a "captcha" before LLMs and Anubis existed, and also because I wanted to use a memory-hard hash function like Scrypt or Argon because I figured that would inflict a lot more pain on bots in the situation where the bot operators eventually decided to bite the bullet and just solve the PoW challenge.

@davepolaschek @joel

Er, sorry, to clarify, what I meant was that docker is not required; it's just the main config example I have right now. the JSON equivalent is:

git.sequentialread.com/forest/

altho it may be out of date w/ the latest changes.

@forestjohnson @joel Ahh. Ok. I’d rather not have to use docker if at all possible.

But I’m also considering alternatives, such as blocking huge whacks of ipv4 space, and saying: “You wanna see this? VPN to somewhere that doesn’t harbor a gazillion AI scrapers.” Which when I think about it, it an entirely different proof-of-work.

Of course they’ll move to ipv6, and I’ll lose that arms-race in the end.

@forestjohnson Or maybe the fossil / SQLite folks will implement a clever solution, and I’ll just update to the newest version and done.

@davepolaschek @joel

This doesnt work because all the AI companies are paying rent to malware authors who trojan horse TCP proxies into tons of phone apps and desktop software.

So all the LLM scraping requests will come from the exact same residential IP address ASNs that your legit users are coming from.

See:

brightdata.com/proxy-types/res
oxylabs.io/products/residentia
webshare.io/residential-proxy
iproyal.com/residential-proxie
soax.com/proxies/residential
proxyempire.io/

huge industry rn

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.