So one thing I'm noticing with my search engine crawler is that the vast majority of robots.txt rejects come from... platforms run by Twitter and Facebook.
Not personal sites. Not Mastodon instances. Nope, it's primarily Twitter and Facebook who blanket-refuse access to a new search engine crawler.