**Luddicus Mus** @algernon@come-from.mad-scientist.club · Mar 26, 2025, 18:25

**Luddicus Mus** @algernon@come-from.mad-scientist.club · Mar 26, 2025, 18:25

Luddicus Mus @algernon@come-from.mad-scientist.club

Mar 26, 2025, 18:25

Luddicus Mus @algernon@come-from.mad-scientist.club

Oh, for fucks sake.

I routed the internet archive's bots (by user agent and by name) to iocaine, because they don't respect /robots.txt. Looks like they're still able to archive at least some of my things.

Time to dig into the logs again, 'cos this won't do.

**Sven Slootweg (soft-deprecated)** @joepie91@pixie.town · 2025-03-26T18:27:35Z

Sven Slootweg (soft-deprecated) @joepie91@pixie.town

@algernon Are you sure they are genuinely the IA's bots, and not some kind of mimicry? AFAIK those do respect robots.txt (stuff frequently gets excluded/missed from the Wayback Machine because of it)

Mar 26, 2025, 18:27 · · Web · · ·

**Luddicus Mus** @algernon@come-from.mad-scientist.club · Mar 26, 2025, 18:40

**Luddicus Mus** @algernon@come-from.mad-scientist.club · Mar 26, 2025, 18:40

Mar 26, 2025, 18:40

Luddicus Mus @algernon@come-from.mad-scientist.club

@joepie91 I am sure, yes. They come from IA's IP range, and they do not respect robots.txt, they stopped doing so in 2017.

Case in point: when I tried to take a capture now:

;> _time:2h request.host:chronicles.mad-scientist.club classification.user_agent:"Internet Archive" | keep request.uri; { "request.uri": "/tales/a-season-on-iocaine/" }

In other words: the only URL IA requested is the one I gave it. It did not even attempt to fetch robots.txt.

Resources

Developers

What is Mastodon?

pixie.town

More…