I feel that local-only coding LLMs are no more exploitative than spending a week reading a bunch of docco and tutorials to get ok at a new language/library. The ML model just automates this away. It doesn't magically make you a good programmer.
What I find exploitative is when it's configured to reproduce large sections of its input corpus while stripping attribution, and sold as a service.
If it's a local-only solution, trained to produce local auto-completions, or suggest the best method usages, then it seems like that's well within the spirit of the OSS software that it was trained upon.
@twoolie @pi_crew@chaos.social There is no such thing as "local-only LLMs".
LLMs (it's in the name!) by their nature are trained on an input corpus so large that you cannot really acquire it through ethical means, let alone train it entirely locally. The technology simply cannot work with that little input data.
(That doesn't mean that local-only *autocompletion* cannot be a thing, but that wouldn't be an LLM.)
@twoolie Many of these "open-source datasets" are scraped non-consentually. The ones that aren't, result in useless LLMs because there isn't enough training data - people have tried.
So yes, it *does* immediately make them unethical, when *in practice* it is impossible to ethically acquire sufficient training material to make the technology work.
Tabby is trained on non-consentually collected data.
@twoolie The BigCode dataset was collected without consent.
@joepie91
Yes, people have tried, and succeed! That's what SantaCoder (the default tabby model) is! Maybe you should Read The Fucking Paper before regurgitating talking points like a hackernews comment thread.
https://arxiv.org/abs/2301.03988
SantaCoder is proof positive that an ethically acquired dataset can be used for training and produce good results, no Kenyan exploitation required.
Technology *is* neutral, dipshit. A nuke can be used to blow up a city, or put out a gas well fire. https://interestingengineering.com/science/soviet-engineers-detonated-a-nuke-miles-underground-to-put-out-a-gas-well-fire
LLMs can be created and used responsibly, we know how to do it, let's not tar an entire field of study with the one brush, ok?