Oh, for fucks sake.

I routed the internet archive's bots (by user agent and by name) to iocaine, because they don't respect /robots.txt. Looks like they're still able to archive at least some of my things.

Time to dig into the logs again, 'cos this won't do.

Follow

@algernon Are you sure they are genuinely the IA's bots, and not some kind of mimicry? AFAIK those do respect robots.txt (stuff frequently gets excluded/missed from the Wayback Machine because of it)

· · Web · 1 · 1 · 1

@joepie91 I am sure, yes. They come from IA's IP range, and they do not respect robots.txt, they stopped doing so in 2017.

Case in point: when I tried to take a capture now:

;> _time:2h request.host:chronicles.mad-scientist.club classification.user_agent:"Internet Archive" | keep request.uri; { "request.uri": "/tales/a-season-on-iocaine/" }

In other words: the only URL IA requested is the one I gave it. It did not even attempt to fetch robots.txt.

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.