**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · 2023-06-21T18:47:26Z

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

Blocklist scraping by fash

So this has been an ongoing issue, would love it if people found the earlier threads about it for more context cause I don't have the spoons right now

Originally written by "mint", hosted on the kiwifarms git is a tool that continuously scrapes publicized instance blocklists to allow searching who has you blocked (resulting in emails like uwu we did nothing wrong how dare you block our instance)

Through correlation, turns out the main IP being used by fba.ryona.agency is `54.37.233.246`. Blocking that at the firewall level prevents them from getting any new data.

Other instances exist too though, being hosted on
`23.24.204.110`, `45.86.70.49`, `88.65.6.124`, `187.190.192.31`

the drow.be / bka.li / teleyal.blog / mooneyed.de "kromonos" user has their own version, that feeds an API that gives your instance a highscore for blocking their shit, scrapes from `185.244.192.119`, with user agents presenting as random instances

These, and other scrapish ip's are also listed in https://git.pixie.town/f0x/nixos/src/branch/main/nodes/aura/configuration.nix#L103

#FediBlock #MastoAdmin

Jun 21, 2023, 18:47 · · · ·

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 18:49

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 18:49

Jun 21, 2023, 18:49

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

re: Blocklist scraping by fash

`70.106.192.146` too, though it's unclear what software it's running

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 19:34

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 19:34

Jun 21, 2023, 19:34

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

re: Mitigating blocklist scraping by fash

Quite interesting workaround; the kiwifarms scraper is configured to not follow HTTP redirects, so by adding one you can make them give up, while legit users can still view the page without issues.

https://git.pixie.town/f0x/nixos/src/branch/main/nodes/aura/services/nginx.nix#L202-L215
Adapts my nginx setup to redirect /about/more to /about/much-more

Of course a scraper could go to much-more directly now, but if we all pick something unique, that's impossible to hardcode for. And if they *do* start following redirects, we could introduce honeypot instances that redirect all around the place, disrupting the scrape (which all happens in sequence across domains btw)

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 22, 2023, 22:22

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 22, 2023, 22:22

Jun 22, 2023, 22:22

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

toot/blocklist scraping info request

can other server admins grep their logs for `159.196.229.70`, they seem to be doing mass scraping of public timelines, toots and blocklists.
from an Australian residential ip?

#MastoAdmin

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 22, 2023, 22:24

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 22, 2023, 22:24

Jun 22, 2023, 22:24

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

re: toot/blocklist scraping info request

also `2a01:4f8:162:6027::2`, with user-agents "Ruby, mastodon 0.1.1" or "mastodon_stream v0.1"

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 22, 2023, 22:36

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 22, 2023, 22:36

Jun 22, 2023, 22:36

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

re: toot/blocklist scraping info request

If so, you can send an abuse report to abuse@aussiebroadband.com.au, regarding ips `159.196.229.70` and `2a01:4f8:162:6027::2`. One of my servers shows scraping access logs going back to at least December 2022

**keschi Θ** @kescher@catcatnya.com · Jun 21, 2023, 18:52

**keschi Θ** @kescher@catcatnya.com · Jun 21, 2023, 18:52

Jun 21, 2023, 18:52

keschi Θ @kescher@catcatnya.com

Blocklist scraping by fash

@f0x oh no is it really time to use a HTTP tarpit again

**keschi Θ** @kescher@catcatnya.com · Jun 21, 2023, 18:57

**keschi Θ** @kescher@catcatnya.com · Jun 21, 2023, 18:57

Jun 21, 2023, 18:57

keschi Θ @kescher@catcatnya.com

Blocklist scraping by fash

@f0x (currently my endpoint will just return the scraper's own IP, but the real list for all authenticated users)

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 18:59

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 18:59

Jun 21, 2023, 18:59

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

re: Blocklist scraping by fash

@kescher tarpit would be of limited use I think, since all their requests have a 5 second timeout

**wb x64** @wilbr@glitch.social · Jun 22, 2023, 07:36

**wb x64** @wilbr@glitch.social · Jun 22, 2023, 07:36

Jun 22, 2023, 07:36

wb x64 @wilbr@glitch.social

Blocklist scraping by fash

@kescher @f0x I feel like limiting actual list info to logged-in users and publishing lists for public consumption in aggregate instead of traceably would probably be better in the end. Giving out that info directly, specifically, to the public, doesn't seem super important.

**fruitbat** @pastelpunkbandit@kittycat.homes · Jun 21, 2023, 19:26

**fruitbat** @pastelpunkbandit@kittycat.homes · Jun 21, 2023, 19:26

Jun 21, 2023, 19:26

fruitbat @pastelpunkbandit@kittycat.homes

Blocklist scraping by fash

@f0x NOOO they're sending spam to poor old Santa?

that's just evil

01D03DYAH8DPQR7PEE9JVWFAAC.jpg

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 19:36

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · Jun 21, 2023, 19:36

Jun 21, 2023, 19:36

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

re: Blocklist scraping by fash

@pastelpunkbandit lmao please unblock us santa uwu we were just shitposting

**jonny (good kind)** @jonny@neuromatch.social · Jun 21, 2023, 20:25

**jonny (good kind)** @jonny@neuromatch.social · Jun 21, 2023, 20:25

Jun 21, 2023, 20:25

jonny (good kind) @jonny@neuromatch.social

Blocklist scraping by fash

@f0x
this is super good to know. we post our Blocklist on a separate wiki with export formats so ppl can re-import them. might be another mode of indirection that makes it hard to scrape but easy to use as intended

**Cadence melodicFreaks** @sixthhokage95@jubi.life · Jun 21, 2023, 21:16

**Cadence melodicFreaks** @sixthhokage95@jubi.life · Jun 21, 2023, 21:16

Jun 21, 2023, 21:16

Cadence melodicFreaks @sixthhokage95@jubi.life

Blocklist scraping by fash

@f0x incredibly minor point of correction, quoting an old locked post of mine from when people were attributing the tool to kf: "It was made by EnjuAihara at youjo dot love (a pedo instance 🤢 ) and forked by mint at ryona dot agency (a fascist sack of shit)."

**Alexander Bochmann** @galaxis@mastodon.infra.de · Jun 22, 2023, 23:35

**Alexander Bochmann** @galaxis@mastodon.infra.de · Jun 22, 2023, 23:35

Jun 22, 2023, 23:35

Alexander Bochmann @galaxis@mastodon.infra.de

re: :boosts_ok_gay: toot/blocklist scraping info request

@f0x I see both here - not a lot of requests per day, but I'm a single-user instance and I haven't checked if they just scrape any of my posts.

**Alexander Bochmann** @galaxis@mastodon.infra.de · Jun 22, 2023, 23:36

**Alexander Bochmann** @galaxis@mastodon.infra.de · Jun 22, 2023, 23:36

Jun 22, 2023, 23:36

Alexander Bochmann @galaxis@mastodon.infra.de

re: :boosts_ok_gay: toot/blocklist scraping info request

@f0x Wasn't there someone out there actively mapping scrapers? Unfortunately I don't remember who that was and finding things in Mastodon ... oh well.

**Alexander Bochmann** @galaxis@mastodon.infra.de · Jul 02, 2023, 15:15

**Alexander Bochmann** @galaxis@mastodon.infra.de · Jul 02, 2023, 15:15

Jul 02, 2023, 15:15

Alexander Bochmann @galaxis@mastodon.infra.de

re: :boosts_ok_gay: toot/blocklist scraping info request

@f0x It's been a couple of days since that reply, but I just now remembered that ScraperSnitch was what I was thinking of: https://www.bentasker.co.uk/posts/blog/security/autodetecting-and-outing-mastodon-scrapers-with-scrapersnitchbot.html

(Note that @scrapersnitch posts as followers-only, so there's nothing to be seen on the public profile.)

**CosmicK9 // Main** @cosmick9@soc.cosmick9.net · Jun 23, 2023, 01:21

**CosmicK9 // Main** @cosmick9@soc.cosmick9.net · Jun 23, 2023, 01:21

Jun 23, 2023, 01:21

CosmicK9 // Main @cosmick9@soc.cosmick9.net

re: toot/blocklist scraping info request

@f0x Looks like AbuseIPDB has quite a few reports from 3 months ago. https://www.abuseipdb.com/check/1...

As for my own instance (Akkoma 3.9.3-28), I don't see any connections from there but I will be keeping an eye out for connections from "AS4764 WIDEBAND-AS-AP Aussie Broadband"

**Anne C.A. Baanen** @Vierkantor@mastodon.vierkantor.com · Jun 23, 2023, 11:26

**Anne C.A. Baanen** @Vierkantor@mastodon.vierkantor.com · Jun 23, 2023, 11:26

Jun 23, 2023, 11:26

Anne C.A. Baanen @Vierkantor@mastodon.vierkantor.com

re: :boosts_ok_gay: toot/blocklist scraping info request

@f0x I'm getting a few hits from that IP every day of the last week or so, user agent "Ruby, mastodon 0.1.1" and always for URL /api/v1/statuses/110575362477129505 (which is, ironically enough, a post about blocking.) Not enough to suggest reporting abuse from my side.

**♡ len** @leni@windycity.style · Jun 24, 2023, 14:19

**♡ len** @leni@windycity.style · Jun 24, 2023, 14:19

Jun 24, 2023, 14:19

♡ len @leni@windycity.style

Blocklist scraping by fash

@f0x@social.pixie.town disregard the question, looks like the nginx worked. I also located a new one so figured I'd share:

209.141.56.3 - - "GET /.well-known/nodeinfo HTTP/2.0" 200 213 "-" "FediList agent (https://fedilist.com/)" "-"