It occurs to me that even if we can't *prevent* LLM companies from scraping by blocking their IP ranges, doing so en masse will certainly make it very *expensive* for them to continue scraping, because residential proxies are not cheap
@joepie91 AI scraping via botnet feels like an any day kinda thing now
@joepie91 the whole dynamic here does make me wonder if router/smart tv/IoT/etc manufacturers and maybe even isps themselves will start trying to get into the residential proxy game, especially with exclusive contracts with specific LLM companies and such. as-is, residential proxy companies are mostly super sketchy, i'm curious if demand from LLM companies is going to make the space less so
@joepie91 browser extensions and apps and such also seems like they might be a vehicle for this. basically anyone who is involved in the consumer software supply chain and cares more about money than ethics seems like they have a huge opportunity here, especially now that being used to gather LLM training data is a non-sketchy-sounding excuse for what would have previously been seen unambiguously as being essentially malware
@joepie91 and like, there's also a part of me that wonders if in a privacy sense this might be a positive development, in that legally making the connection between "this request came from this ip address" and "this request came from this person" might become way harder in a world like that. but the downsides are obviously also huge...
("Residential proxies" are the thing that they use to come at you from home user IP addresses when you block their servers, to try and evade your block and become undetectable)