Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks Paper • 2606.29082 • Published 6 days ago • 26
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published Mar 19 • 70
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 17 days ago • 63
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 22 days ago • 109
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration Paper • 2606.04743 • Published 30 days ago • 47
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published May 28 • 79
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents Paper • 2605.28775 • Published May 27 • 38
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published May 27 • 93
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents Paper • 2605.17873 • Published May 18 • 12
Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR Paper • 2605.10781 • Published May 11 • 17
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 51
It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs Paper • 2605.20258 • Published May 18 • 30
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR Paper • 2605.15726 • Published May 15 • 35
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR Paper • 2605.15726 • Published May 15 • 35