Follow

: What would the most sensible way to implement robots.txt for a search engine project? The intended behaviour is to be reasonably conservative; ideally it should tolerate some weirdness in the robots.txt and default to "do not crawl" in cases of doubt as to the author's intention, but still have good coverage of sites that clearly did not intend to block crawlers (and default to "crawl" if there is no robots.txt or equivalent at all).

(Asking because there's not one robots.txt standard, and perhaps people here have preferences on what variant is the best choice to support here)

· · Web · 1 · 0 · 1
Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.