Here we go again explaining supposedly technologically literate people that what they *publish* on the Internet can and will be scraped... Bluesky's explanation ("we can't enforce this") is on point btw.

RE: https://infosec.exchange/@josephcox/113551853623942786

@buherator People broadly understand this just fine. The problem is not of a technical nature, it is of a social and ethical nature, and the culture of a place absolutely *does* affect whether someone can get away with this or not.

@joepie91 Based on arguments I had over here people definitely believe that technical measures at the publishing platform (such as limiting search) can affect this. Also, what is the point of being outraged about the single person who is open about his scraping while I guarantee you a dozen other orgs do the same rn just don't talk about it?

@buherator Technical measures *can* affect it. The exact measures needed and their exact impact are going to vary from case to case, but yes, putting up barriers does in fact make it less likely to happen, even if it cannot fully prevent it.

As to "what is the point of being outraged": because that is how you set social norms in a community, and make clear to potential scrapers that they will be doing so at the cost of their inclusion in the community. This is how it works in all social environments and it seems to be mostly just IT nerds who think this "doesn't work", despite mountains of evidence to the contrary.

Nobody half-competent believes that there's some magical incantation to totally stop any and all scraping. But it's equally absurd to go "well, it's public, nothing you can do, it literally doesn't matter". Harm reduction is a thing, and crucially important to many vulnerable and marginalized folks.

@joepie91 Do you really think people who want to e.g. earn money with this give a flying fart if they are excluded from a community (which they weren't part of in the first place)?

@buherator They might not themselves. But the dataset itself becomes toxic, and if it's known as "that dataset from the people who didn't want it", that will make an awful lot of people think twice before using it.

@buherator The dynamic is same as with many other forms of abuse; the group of malicious people is very small, and the only reason they can do so much damage is because they can bank on the tolerance of a much broader set of people who wouldn't do the malicious thing themselves, but also aren't going to look too closely at the background of what someone else did.

Making a stink out of the collection process sabotages the use of the dataset for that group of people, which is going to be most of them.

@joepie91 - You're still assuming you can know about the scraping in the first place
- Money doesn't stink
Follow

@buherator I'm assuming no such thing. That's the whole point of social norms; they apply and have an effect *without* needing to personally know about every single case.

· · Web · 1 · 0 · 0
@joepie91 OK, please let me know when the scraping stops because of our collective will!
Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.