Follow

Blocklist scraping by fash 

So this has been an ongoing issue, would love it if people found the earlier threads about it for more context cause I don't have the spoons right now

Originally written by "mint", hosted on the kiwifarms git is a tool that continuously scrapes publicized instance blocklists to allow searching who has you blocked (resulting in emails like uwu we did nothing wrong how dare you block our instance)

Through correlation, turns out the main IP being used by fba.ryona.agency is `54.37.233.246`. Blocking that at the firewall level prevents them from getting any new data.

Other instances exist too though, being hosted on
`23.24.204.110`, `45.86.70.49`, `88.65.6.124`, `187.190.192.31`

the drow.be / bka.li / teleyal.blog / mooneyed.de "kromonos" user has their own version, that feeds an API that gives your instance a highscore for blocking their shit, scrapes from `185.244.192.119`, with user agents presenting as random instances

These, and other scrapish ip's are also listed in git.pixie.town/f0x/nixos/src/b

re: Blocklist scraping by fash 

`70.106.192.146` too, though it's unclear what software it's running

re: Mitigating blocklist scraping by fash 

Quite interesting workaround; the kiwifarms scraper is configured to not follow HTTP redirects, so by adding one you can make them give up, while legit users can still view the page without issues.

git.pixie.town/f0x/nixos/src/b
Adapts my nginx setup to redirect /about/more to /about/much-more

Of course a scraper could go to much-more directly now, but if we all pick something unique, that's impossible to hardcode for. And if they *do* start following redirects, we could introduce honeypot instances that redirect all around the place, disrupting the scrape (which all happens in sequence across domains btw)

:boosts_ok_gay:​ toot/blocklist scraping info request 

can other server admins grep their logs for `159.196.229.70`, they seem to be doing mass scraping of public timelines, toots and blocklists.
from an Australian residential ip?

re: :boosts_ok_gay:​ toot/blocklist scraping info request 

also `2a01:4f8:162:6027::2`, with user-agents "Ruby, mastodon 0.1.1" or "mastodon_stream v0.1"

re: :boosts_ok_gay:​ toot/blocklist scraping info request 

If so, you can send an abuse report to abuse@aussiebroadband.com.au, regarding ips `159.196.229.70` and `2a01:4f8:162:6027::2`. One of my servers shows scraping access logs going back to at least December 2022

Blocklist scraping by fash 

@f0x oh no is it really time to use a HTTP tarpit again

Blocklist scraping by fash 

@f0x (currently my endpoint will just return the scraper's own IP, but the real list for all authenticated users)

re: Blocklist scraping by fash 

@kescher tarpit would be of limited use I think, since all their requests have a 5 second timeout

Blocklist scraping by fash 

@kescher @f0x I feel like limiting actual list info to logged-in users and publishing lists for public consumption in aggregate instead of traceably would probably be better in the end. Giving out that info directly, specifically, to the public, doesn't seem super important.

Blocklist scraping by fash 

@f0x NOOO they're sending spam to poor old Santa?

that's just evil

re: Blocklist scraping by fash 

@pastelpunkbandit lmao please unblock us santa uwu we were just shitposting

Blocklist scraping by fash 

@f0x
this is super good to know. we post our Blocklist on a separate wiki with export formats so ppl can re-import them. might be another mode of indirection that makes it hard to scrape but easy to use as intended

Blocklist scraping by fash 

@f0x incredibly minor point of correction, quoting an old locked post of mine from when people were attributing the tool to kf: "It was made by EnjuAihara at youjo dot love (a pedo instance 🤢 ) and forked by mint at ryona dot agency (a fascist sack of shit)."

re: :boosts_ok_gay:​ toot/blocklist scraping info request 

@f0x I see both here - not a lot of requests per day, but I'm a single-user instance and I haven't checked if they just scrape any of my posts.

re: :boosts_ok_gay:​ toot/blocklist scraping info request 

@f0x Wasn't there someone out there actively mapping scrapers? Unfortunately I don't remember who that was and finding things in Mastodon ... oh well.

re: :boosts_ok_gay:​ toot/blocklist scraping info request 

@f0x It's been a couple of days since that reply, but I just now remembered that ScraperSnitch was what I was thinking of: bentasker.co.uk/posts/blog/sec

(Note that @scrapersnitch posts as followers-only, so there's nothing to be seen on the public profile.)

re: :boosts_ok_gay:​ toot/blocklist scraping info request 

@f0x Looks like AbuseIPDB has quite a few reports from 3 months ago. https://www.abuseipdb.com/check/1...

As for my own instance (Akkoma 3.9.3-28), I don't see any connections from there but I will be keeping an eye out for connections from "AS4764 WIDEBAND-AS-AP Aussie Broadband" :googlebear:

re: :boosts_ok_gay:​ toot/blocklist scraping info request 

@f0x I'm getting a few hits from that IP every day of the last week or so, user agent "Ruby, mastodon 0.1.1" and always for URL /api/v1/statuses/110575362477129505 (which is, ironically enough, a post about blocking.) Not enough to suggest reporting abuse from my side.

Blocklist scraping by fash 

@f0x@social.pixie.town disregard the question, looks like the nginx worked. I also located a new one so figured I'd share:

209.141.56.3 - - "GET /.well-known/nodeinfo HTTP/2.0" 200 213 "-" "FediList agent (
https://fedilist.com/)" "-"

re: Blocklist scraping by fash 

@leni oh yeah, FediList (used to?) scrape over tor, so that's caught by a user-agent block instead git.pixie.town/f0x/nixos/src/b

re: Blocklist scraping by fash 

@f0x@social.pixie.town looks like they still are, I just caught it again with a different IP :ablobrollingeyes: thanks again for sharing all this!

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.