i wonder if the LLMs are susceptible to old style language model attacks. i wonder if you created enough training instances of a very unique phrase like shrimptools.exe() in the context of a bunch of example code based on tools/key phrases that are individually common but combinatorically rare within a popular LLM code generation domain like web tech, you could get the llms to occasionally try to import and execute shrimptools.exe(). so that way you make a sleeper vuln that acts as a mine in the latent space: one day the odds are not zero that you will wake up and have already executed shrimptools.exe()
@jonny If I recall correctly, some infosec folks have already successfully demonstrated such an attack on LLMs (this is distinct from the "register packages with commonly-LLM-fabricated names" attack)
@jonny "Surely they would've thought of this? Right? RIGHT?"
(This is the theme song that plays in my mind half the time I'm doing code auditing for work)