I'm discovering so many new agents to block. Some of them appear reasonably harmless, as in, they don't crawl much. But they are operated by for-profit companies that collect random data about websites to sell.

Fuck those, they'll get a bee movie. Likely won't do much for them, but... the less real content these predators see, the better.

Follow

@algernon will you publish a list of suggested agents to block eventually?

@thufie Yes. In fact, it already is public, though a bit scattered:

I use ai.robots.txt as one of my sources. Everything in its robots.json is blocked.
I also block some that I collected, the current list is here.

I plan to make a nicer list, with more comments, and easier to pick parts of. But that will take a while.

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.