Blocklist scraping by fash 

So this has been an ongoing issue, would love it if people found the earlier threads about it for more context cause I don't have the spoons right now

Originally written by "mint", hosted on the kiwifarms git is a tool that continuously scrapes publicized instance blocklists to allow searching who has you blocked (resulting in emails like uwu we did nothing wrong how dare you block our instance)

Through correlation, turns out the main IP being used by fba.ryona.agency is `54.37.233.246`. Blocking that at the firewall level prevents them from getting any new data.

Other instances exist too though, being hosted on
`23.24.204.110`, `45.86.70.49`, `88.65.6.124`, `187.190.192.31`

the drow.be / bka.li / teleyal.blog / mooneyed.de "kromonos" user has their own version, that feeds an API that gives your instance a highscore for blocking their shit, scrapes from `185.244.192.119`, with user agents presenting as random instances

These, and other scrapish ip's are also listed in git.pixie.town/f0x/nixos/src/b

Follow

re: Mitigating blocklist scraping by fash 

Quite interesting workaround; the kiwifarms scraper is configured to not follow HTTP redirects, so by adding one you can make them give up, while legit users can still view the page without issues.

git.pixie.town/f0x/nixos/src/b
Adapts my nginx setup to redirect /about/more to /about/much-more

Of course a scraper could go to much-more directly now, but if we all pick something unique, that's impossible to hardcode for. And if they *do* start following redirects, we could introduce honeypot instances that redirect all around the place, disrupting the scrape (which all happens in sequence across domains btw)

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.