**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:03

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:03

blackle mori @suricrasia@lethargic.talkative.fish

Jul 29, 2025, 00:03

blackle mori @suricrasia@lethargic.talkative.fish

I think the only cool LLM application I'd like to see is a thing where you type in a sentence and it tells you, based on the predicted perplexity of that sentence and a fermi estimate, how many times it might've been spoken in the past

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:04

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:04

Jul 29, 2025, 00:04

blackle mori @suricrasia@lethargic.talkative.fish

I think it's really cool how we can accurately measure the information content of a sentence given surrounding content. everything else fucking sucks

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:06

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:06

Jul 29, 2025, 00:06

blackle mori @suricrasia@lethargic.talkative.fish

there's so many questions in NLP that a highly accurate statistical model could help answer but it feels like everyone's lost interest in that

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:09

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:09

Jul 29, 2025, 00:09

blackle mori @suricrasia@lethargic.talkative.fish

can we accurately decompose a sentence into its semantic information and its grammatical information? "six comes before seven" and "six precedes seven" have the same semantic content but make different grammatical choices. so some number of bits in their perplexity must be taken up by the grammar choice and not the actual semantics. can we quantify that?

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:14

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:14

Jul 29, 2025, 00:14

blackle mori @suricrasia@lethargic.talkative.fish

when LLM-based "assistants" create responses they sample from the LLM's distribution semi-randomly, sharpening the distribution so it picks more likely options more. because it's sampling randomly, it's both making arbitrary decisions about grammatical structure (which I doubt many care about so much) but *also* arbitrary decisions about the semantic content. it seems to me like it would be useful to quantify how much of the semantic content of a response us up to chance? but nobody seems to care? what the fuck?

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:20

**blackle mori** @suricrasia@lethargic.talkative.fish · Jul 29, 2025, 00:20

Jul 29, 2025, 00:20

blackle mori @suricrasia@lethargic.talkative.fish

earlier this decade, between talktotransformer and chatgpt there was a window where transformers, despite being better than RNNs, were still quite bad at modelling language and were unable to keep coherency for longer than a few sentences. *and that's what made them funny.* that's what made them compelling. the early days of AI dungeon was a push-and-pull between the user and the machine, the user trying to creatively keep things on-track in spite of the chaos. that was fun

**Orchid** @lyncia@pixie.town · 2025-07-29T00:31:25Z

Orchid @lyncia@pixie.town

@suricrasia i will miss great eternal loops like "you eat the maclanky"

Jul 29, 2025, 00:31 · · · ·

Resources

Developers

What is Mastodon?

pixie.town

More…