another scraper in my logs, "fedimapper" by https://hachyderm.io/@tedivm/109569119442998113
not sure why it needs to scrape my domain_blocks with that stated purpose
another scraper by an (ex-?)twitter employee, running without recognizable user-agent (uses the typhoeus library with defaults)
creating some full-text search federation wide search... https://macaw.social/@angilly/109597402157254670
@f0x uggghhhhhh
@f0x hello! Didn’t expect this to make waves while I was exploring. I intended to add a proper user agent and will do so this week.
Am I wrong that I shouldn’t be using these public APIs? Many instances have authentication turned on so I (perhaps incorrectly!) assumed if authentication wasn’t turned on I was allowed to hit the endpoints.
@angilly on my instances these requests get denied, yes. but just because data is public does not mean people consent to it being scraped, processed, and published elsewhere. If arbitrary full-text search was desireable on the fediverse, it would've been there.
It's not a technical challenge that needs solving, it's a social decision to do without it.
https://docs.joinmastodon.org/user/network/#search
> It deliberately does not allow searching for arbitrary strings in the entire database, in order to reduce the risk of abuse by people searching for controversial terms to find people to dogpile.
https://blog.joinmastodon.org/2018/07/cage-the-mastodon/
> Mastodon deliberately does not support arbitrary search
@f0x thank you. I did not read the docs or blog closely enough. I’ll turn off the API worker when I’m back at my desk this evening.
Fwiw I intended on trying to have this conversation publicly before ever launching something available to the public. You beat me to it and I deeply appreciate you engaging with me on it.
@angilly i appreciate you taking these concerns in stride. In the future though, I'd recommend having the public discussion *before* starting to collect any data
@f0x yes will do. I don’t even know where these discussions take place yet! Do they tend to happen organically on mastodon? I’ve read the CONTRIBUTING.md and github doesn’t seem the right place for a question like what I want to ask. That is, loosely:
“What do folks think about a layer on top of mastadon: a search engine with 24hr retention so that folks can find like minded folks discussing similar topics? To support this, I’d like to add an admin setting for admins to opt-in to it.”
fediverse.network scrapes a whole range of endpoints, but the site it refers to is just a parked domain