@internetarchive is there a way to verify whether a crawler with an ArchiveTeam user-agent is actually operating on your behalf?
I am currently using the method described in this GitHub discussion (https://github.com/internetarchive/heritrix3/discussions/507) to detect and ban scrapers that spoof Googlebot and Bingbot UA strings, but it doesn't seem to work for some bots that have crawled my site(s) today.
I would like to allow the Internet Archive to preserve copies of my pages, but without a method to validate their authenticity this will leave a hole that AI scrapers can abuse.
Thufie
BLM
~
~ ![]()
languages: en:✔️ he:~ es:~ ru:~
Reluctant moderator on social.pixie.town
Most online member of the system
#yesbot #nobot #noarchive I'm in my 20s, as a Computer Science researcher (Not in "AI" 🙄). Also a YouTuber now apparently, making YouTube Poops.
.אין דין ואין דיין. שלום בעולם
I'm just a disoriented white girl trying her best.
![]()
Relationship Anarchist![]()
programming languages?
C++ C MIPS x86 Java Python and a few others :P