Searchtodon meta, scraping related
I mentioned these concerns in the announcement thread but wanted to reiterate them here separately. https://social.pixie.town/@f0x/109677581893916570
It does a lot of things right, and advertises itself as built with privacy and consent in mind.
However, while a user's search results are limited to content they could've otherwise seen pass by in their home timeline, all these toots are stored and indexed on the central Searchtodon server, indefinitely.
This means he technically has access to the combined timelines of all the users, and unlike public content scrapers **also followers-only and even DM posts** sent by **any user a Searchtodon user is following**.
There's only an opt-*out* mechanism based on setting your profile to be non-search-engine-indexible, or including a few specific hashtags.
Without opting out though **all your toots** will be stored if *any* of your followers use this tool.
While this for now remains just a technical possibility, with him stating he has no intent of misusing it, there is no way to guarantee this now or in the future, or when this data changes hands (sold off or hacked).
A services like this could have merit, but should absolutely be hosted by yourself or your own instance, since it already has control over all this data, meaning there's no extra party to trust.
re: Searchtodon meta, scraping related
From https://chaos.social/@janl/109677152590563058 and https://chaos.social/@janl/109677164080124847 and the kinda evasive responses to mine and others concerns, it seems he's only intent in listening if it's something a [larger part] of "the community" rather not have, so it's worth chiming in on the original thread
update: Searchtodon meta, scraping related
Since then multiple others have mentioned these concerns to him, but they're dismissed just the same.
Yet again a recently joined twitter techbro is writing a scraper, but this time it's couched in language about "consent" and "privacy", it's still effectively building a centralized search index across users on his single server. Opt-out is also not actually consent, both legally (GDPR) and morally.
He keeps dismissing it as just a non-ideal stopgap-solution but that doesn't matter. it's about what's happening right now. random users logging in thinking there's *anything* private about this service, and feeding their entire following to the machine