Prompt dedup. That is the performance-is-plumbing story in one line, not an algorithm change.
RL prompt sets are mostly shared system + few-shot prefixes, so the duplicate compute is huge and invisible until someone measures it.
Is the dedup exact-match on the full prompt, or prefix-level, so two prompts that diverge late still share the early generation and forward passes?