I feel that the best explanation for neural text degeneration is that language is a communication medium. when I talk I'm transmitting bytes to you; the protocol we're using to transmit those bytes is english. therefore normal human language contains entropy that you just can't get rid of (unless you're omniscient.)
when you try to generate text with a LLM by maximizing likelihood, you're in turn minimizing entropy and therefore producing a sentence that contains no information. hence the repetitions and nonsensical outputs