LLM stuff, "public good"
Arguing that LLMs are a positive development because they can be used for public good is all well and good, but that would be a much more credible argument if people were actually appropriately credited and compensated for the labour that went into the training data used for that
Like, this discussion would look very different in a society where there's eg. universal basic income, capitalism has been abolished entirely, or there's some other "everyone is taken care of" solution, but that's not the society we live in and that should raise some Questions about any "public good" argument, because exactly whose "public" are we talking about here?
LLM stuff, "public good"
@joepie91 It’s very hard to get an LLM to recognize all the “wrong” things it might potentially do because this requires teaching it general heuristics, which AIs can’t internalize (and probably won’t be able to for a while). Humans are good at general heuristics so a lot of keeping “bad” behavior out of, say, ChatGPT would require a certain amount of upstream work by humans with privacy training curating its dataset.
/2
LLM stuff, "public good"
@joepie91 (That’s just one example of course.)
But this would require more time, effort, and money than OpenAI et al want to put in, especially when they want to grow their model fast on MOAR DATA and it helps if they’re not picky about where they get it.
Being picky about things like not accidentally including sensitive data, which would require review by trained humans, is expensive!
/3
LLM stuff, "public good"
@joepie91 So you can see even without getting into stuff about capitalism, socialism, UBI, etc. as such, you can make an argument that LLMs aren’t automatically a public good and there is a lot of potential for harm, as well as a lot of incentive that could lead to harm because of the reckless ingestion and exposure of sensitive data. That’s just *one* aspect of harm.
/end
LLM stuff, "public good"
@joepie91 There are a lot of indications from a cybersecurity and records management perspective that LLMs like ChatGPT cause a lot of harm because the business model of the people who run it disincentivizes due diligence and due care in curation practices that would prevent harm, for example, keeping protected health or financial information out of the training dataset that might be disgorged by the right/wrong prompt.
/1