poisoning AI by disregarding disabled people 

just posting this on its own since it probably got lost already, as a comment on a different thread:

basically every method of "poisoning" AIs by messing with content in "unnoticeable" ways for all clients, will make your content less accessible.

trying to analyse network traffic and put AI scrapers into mazes is fine and won't affect most people. adding hidden text "off screen" has a very high chance of affecting screen readers and other accessibility aids, while not affecting AI scrapers.

one key example is where people have proposed to modify subtitles in videos to add lots of garbage that is "invisible" or "off screen", similar to the age-old trick of having white text in the background of a PDF that gets picked up by résumé scanners. the solution that all of these tools have chosen is to just render the document and OCR the text, meaning that the bad actors continue as usual, but the good actors get messed up as their screen reader lists out all sorts of nonsense, and everyone suffers as the scanners fail to properly OCR text that isn't perfect Times New Roman Eleven Point Font.

for videos, people have argued that blind people can listen to the unmodified audio track, whereas deaf people can read the subtitles which are only showing what's on the screen. this excludes deaf-blind people who are forced to read the transcript with a braille reader: they will get the nonsense you put in the subtitles. or, just anyone who wants to read the transcript for any other reason.

it really is admirable for people to try and "poison" AI data, but unless you do so in a way that is tailored to the clients doing the scraping (network traffic), you are going to just fuck over people who need your accessibility data. please don't

poisoning AI by disregarding disabled people 

@clarfonthey i saw one post around here recently that said you can hide AI-poisoning tools from accessibility systems in web code. I don't remember where I was that and i'm not well informed enough to remember the details any better, but I'm glad there are at least some people thinking this through

re: poisoning AI by disregarding disabled people 

@Yza seems pretty sus honestly

like, if accessibility tools can work around the problem, surely the AI scrapers can too? it feels like a situation of a pointless arms race where people have the potential to get caught in the crossfire

re: poisoning AI by disregarding disabled people 

@clarfonthey i mean that screen readers etc will skip over the AI poison trap so will work fine, but to the scrapers it'd look like a regular part of the website. it's not like these tools can reason and i doubt anyone's training them to only read the accessibility enabled areas, but idk. i just hope those talking about it kow what they're doing

Follow

re: poisoning AI by disregarding disabled people 

@Yza @clarfonthey How would that even work, though? What stops an AI scraper from simply acting like an accessibility tool in its interpretation of the page?

· · Web · 1 · 0 · 3

re: poisoning AI by disregarding disabled people 

@joepie91 @clarfonthey it could certainly read the accessibility stuff. the point is more to direct it into a scraper trap before it gets to the accessibility content. the LLM will be unlikely to skip the trap because it's a honeypot designed to look like the sort of content LLMs are looking for. something like that anyway. i'm no expert i'm just trying to recount what i read

re: poisoning AI by disregarding disabled people 

@Yza @clarfonthey I do not find that a credible idea. LLM companies are certainly aware of accessibility content, and that it is a good way to find relatively high-quality training data. They aren't just taking whatever content they come across without quality assessment.

And if you're relying on scraper traps anyway (assuming they work), then you don't need to pollute the accessibility information to begin with, and it is no longer relevant to the original post?

I don't think it's helpful to suggest vague solutions without concrete details on how it would work and avoid disaster, personally. It creates the appearance of there being solutions without actually solving any of the hard problems.

re: poisoning AI by disregarding disabled people 

@Yza @joepie91 FWIW that's basically what I proposed here: get the scraper caught in something else based upon its network traffic rather than risk messing with accessibility tools too

re: poisoning AI by disregarding disabled people 

@clarfonthey @joepie91 okay cool. all i was trying to say in the first place was i saw people talking about doing this sort of thing and was glad there was some awareness and discussion of these issues.

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.