Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling Paper • 2606.03102 • Published 4 days ago • 13
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published 17 days ago • 50
G-Zero: Self-Play for Open-Ended Generation from Zero Data Paper • 2605.09959 • Published 26 days ago • 17
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 29 days ago • 69
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration Paper • 2605.05566 • Published 30 days ago • 37