**Sven Slootweg (soft-deprecated)** @joepie91@pixie.town · 2024-10-24T10:39:46Z

Sven Slootweg (soft-deprecated) @joepie91@pixie.town

Sven Slootweg (soft-deprecated) @joepie91@pixie.town

#AskFedi: What would the most sensible way to implement robots.txt for a search engine project? The intended behaviour is to be reasonably conservative; ideally it should tolerate some weirdness in the robots.txt and default to "do not crawl" in cases of doubt as to the author's intention, but still have good coverage of sites that clearly did not intend to block crawlers (and default to "crawl" if there is no robots.txt or equivalent at all).

(Asking because there's not one robots.txt standard, and perhaps people here have preferences on what variant is the best choice to support here)

Oct 24, 2024, 10:39 · · Web · · ·

Resources

Developers

What is Mastodon?

pixie.town

More…