Show newer

re: Searchtodon meta, scraping related :boosts_ok_gay:​ 

His stated goal is to run this as an 'experiment' to 'have this conversation', but in my opinion that could've happened (and was already happening) without publishing a tool, or at the very least making people explicitly **opt-in** to indexing of their toots like this

Show thread

Searchtodon meta, scraping related :boosts_ok_gay:​ 

I mentioned these concerns in the announcement thread but wanted to reiterate them here separately. social.pixie.town/@f0x/1096775

It does a lot of things right, and advertises itself as built with privacy and consent in mind.

However, while a user's search results are limited to content they could've otherwise seen pass by in their home timeline, all these toots are stored and indexed on the central Searchtodon server, indefinitely.

This means he technically has access to the combined timelines of all the users, and unlike public content scrapers **also followers-only and even DM posts** sent by **any user a Searchtodon user is following**.

There's only an opt-*out* mechanism based on setting your profile to be non-search-engine-indexible, or including a few specific hashtags.
Without opting out though **all your toots** will be stored if *any* of your followers use this tool.

While this for now remains just a technical possibility, with him stating he has no intent of misusing it, there is no way to guarantee this now or in the future, or when this data changes hands (sold off or hacked).

A services like this could have merit, but should absolutely be hosted by yourself or your own instance, since it already has control over all this data, meaning there's no extra party to trust.

@janl@chaos.social Mastodon is larger than Eugen though, and there's actually a sizeable part of the community that also doesn't agree with other (privacy-related) choices Eugen makes, like the Mastodon 4.x hugely exposed client API.

It's good to have a conversation about this with the community, but that goes much better when there isn't a published tool that is (perceived as) an active threat.

Opt-out is not enough, this needs an active opt-in, like a recognizable hashtag in bio.

@tastytea@very.tastytea.de the noindex setting is exposed in the account object as `discoverable` (it's false for both of us), and so is the bio that might contain the opt-out hashtag stuff

@janl@chaos.social Right, I get that you currently have no interest to do so, and I'm inclined to believe you, but this is what it gives you access to, now or in the indefinite future.

Thing is though, Mastodon already has Elasticsearch support, and it's an explicit choice to scope the search functionality to just your own toots, hashtags and posts you've actually interacted with, not just the ones you could've seen passing by.

It's like this because of social reasons, not technical issues that need solving, and that's what all the scraper- and adjacent projects seem to get wrong.

@tastytea@very.tastytea.de robots.txt doesn't make too much sense because afaik it's using the mastodon client api from the logged in user, to keep a searchable copy of everything that passes by in their home timeline, not contacting/scraping any remote instances directly?

@janl@chaos.social IMO this should either be ran by the instance the user is already on (because you already trust them with exactly that data), or be incredibly upfront that you are now *indefinitely* trusting a third-party with all this data and permissions

@janl@chaos.social I think you have mostly the right idea, but because this isn't self-hostable, it does effectively give your single server a very large amount of data as a combination from all the user's home timelines.

There's no way to verify or guarantee that isn't being used for other purposes (now or in the future). Worse than public-scrapers even, this also gives you access to followers-only and dm content (and the account, through the wider permissions Elk requests)

@tastytea@very.tastytea.de it's not really a crawler, and they seem to have mostly the right idea, but due to it being a single centralized service, with them storing all the content indefinitely, it does effectively give *them* access to a much wider view of the fediverse, even if it's properly scoped for users to all the content they could've seen pass by in their timeline

too much posting, nobody should be allowed this much posting power

also more hyping myself up like: there's a reason i'm refactoring this code, but the quality ain't all that bad. i can still understand it after a few months :")

Show thread

re: Instance block recommendation 

also lmao at the anarchy symbol as the favicon, you really don't get it, do you

Show thread

Instance block recommendation 

impeccable.social, single-user Pleroma/Soapbox instance, feed is full of boosting other shitty instances like noagendasocial.com and gleasonator.com

found them through a reply recommending Soapbox (eww) and Wildebeest (cloudflare eww) software

i'm posting too much so probably had too many stimulants

Show older
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.