I'm sad to say that we're following the lead of many others and putting in proof-of-work proxies into place to protect ourselves against "AI" crawler bots. Yes, I hate this as much as you, but all other options are currently worse (such as locking us into specific vendors).

We'll be rolling it out on lore.kernel.org and git.kernel.org in the next week or so.
Follow

@monsieuricon

Right now, I think that these bot deterrents are mostly just functioning similar to a "security thru obscurity" javascript blob.

I don't think the difficulty actually matters at all, you might as well set it to one because if scrapers ever try to solve the proof of work in the future, I think sha256 is categorically not going to work anymore since it's so easy to accelerate and so many accelerators for it already exist (bitcoin).

I actually created a proof of work bot deterrent before the LLM hype even existed. Back then I chose Scrypt as a memory-hard hash function because I wanted it to be as easy as possible for normal website visitors to solve, but as painful as possible for scrapers, even after they perceive it and react to it.

I don't have mine triggering on browser user agents. I just have it trigger all the time by default except for some tools that I allow list like git, npm, go, etc. I also explicitly allow home pages and repository home pages so that search indexers can still find things and display them.

You can see a demo of it here as well as the source code:

git.sequentialread.com/forest/

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.