Blocklist scraping by fash
So this has been an ongoing issue, would love it if people found the earlier threads about it for more context cause I don't have the spoons right now
Originally written by "mint", hosted on the kiwifarms git is a tool that continuously scrapes publicized instance blocklists to allow searching who has you blocked (resulting in emails like uwu we did nothing wrong how dare you block our instance)
Through correlation, turns out the main IP being used by fba.ryona.agency is `54.37.233.246`. Blocking that at the firewall level prevents them from getting any new data.
Other instances exist too though, being hosted on
`23.24.204.110`, `45.86.70.49`, `88.65.6.124`, `187.190.192.31`
the drow.be / bka.li / teleyal.blog / mooneyed.de "kromonos" user has their own version, that feeds an API that gives your instance a highscore for blocking their shit, scrapes from `185.244.192.119`, with user agents presenting as random instances
These, and other scrapish ip's are also listed in https://git.pixie.town/f0x/nixos/src/branch/main/nodes/aura/configuration.nix#L103
re: Mitigating blocklist scraping by fash
Quite interesting workaround; the kiwifarms scraper is configured to not follow HTTP redirects, so by adding one you can make them give up, while legit users can still view the page without issues.
https://git.pixie.town/f0x/nixos/src/branch/main/nodes/aura/services/nginx.nix#L202-L215
Adapts my nginx setup to redirect /about/more to /about/much-more
Of course a scraper could go to much-more directly now, but if we all pick something unique, that's impossible to hardcode for. And if they *do* start following redirects, we could introduce honeypot instances that redirect all around the place, disrupting the scrape (which all happens in sequence across domains btw)
toot/blocklist scraping info request
can other server admins grep their logs for `159.196.229.70`, they seem to be doing mass scraping of public timelines, toots and blocklists.
from an Australian residential ip?
Blocklist scraping by fash
@f0x oh no is it really time to use a HTTP tarpit again
Blocklist scraping by fash
@f0x (currently my endpoint will just return the scraper's own IP, but the real list for all authenticated users)
re: Blocklist scraping by fash
@kescher tarpit would be of limited use I think, since all their requests have a 5 second timeout
Blocklist scraping by fash
re: Blocklist scraping by fash
@pastelpunkbandit lmao please unblock us santa uwu we were just shitposting
Blocklist scraping by fash
@f0x
this is super good to know. we post our Blocklist on a separate wiki with export formats so ppl can re-import them. might be another mode of indirection that makes it hard to scrape but easy to use as intended
Blocklist scraping by fash
@f0x incredibly minor point of correction, quoting an old locked post of mine from when people were attributing the tool to kf: "It was made by EnjuAihara at youjo dot love (a pedo instance 🤢 ) and forked by mint at ryona dot agency (a fascist sack of shit)."
re: :boosts_ok_gay: toot/blocklist scraping info request
@f0x I see both here - not a lot of requests per day, but I'm a single-user instance and I haven't checked if they just scrape any of my posts.
re: :boosts_ok_gay: toot/blocklist scraping info request
@f0x Wasn't there someone out there actively mapping scrapers? Unfortunately I don't remember who that was and finding things in Mastodon ... oh well.
re: :boosts_ok_gay: toot/blocklist scraping info request
@f0x It's been a couple of days since that reply, but I just now remembered that ScraperSnitch was what I was thinking of: https://www.bentasker.co.uk/posts/blog/security/autodetecting-and-outing-mastodon-scrapers-with-scrapersnitchbot.html
(Note that @scrapersnitch posts as followers-only, so there's nothing to be seen on the public profile.)
re: toot/blocklist scraping info request
re: :boosts_ok_gay: toot/blocklist scraping info request
@f0x I'm getting a few hits from that IP every day of the last week or so, user agent "Ruby, mastodon 0.1.1" and always for URL /api/v1/statuses/110575362477129505 (which is, ironically enough, a post about blocking.) Not enough to suggest reporting abuse from my side.
Blocklist scraping by fash
@f0x@social.pixie.town disregard the question, looks like the nginx worked. I also located a new one so figured I'd share:
209.141.56.3 - - "GET /.well-known/nodeinfo HTTP/2.0" 200 213 "-" "FediList agent (https://fedilist.com/)" "-"
re: Blocklist scraping by fash
@leni oh yeah, FediList (used to?) scrape over tor, so that's caught by a user-agent block instead https://git.pixie.town/f0x/nixos/src/branch/main/nodes/aura/services/nginx.nix#L32
re: Blocklist scraping by fash
@f0x@social.pixie.town looks like they still are, I just caught it again with a different IP thanks again for sharing all this!
re: Blocklist scraping by fash
`70.106.192.146` too, though it's unclear what software it's running