I'm discovering so many new agents to block. Some of them appear reasonably harmless, as in, they don't crawl much. But they are operated by for-profit companies that collect random data about websites to sell.
Fuck those, they'll get a bee movie. Likely won't do much for them, but... the less real content these predators see, the better.
@thufie Yes. In fact, it already is public, though a bit scattered:
I use ai.robots.txt as one of my sources. Everything in its robots.json is blocked.
I also block some that I collected, the current list is here.
I plan to make a nicer list, with more comments, and easier to pick parts of. But that will take a while.