Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts Paper • 2606.05922 • Published 19 days ago • 67
The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages Paper • 2605.27901 • Published 27 days ago • 13
OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents Paper • 2605.28158 • Published 27 days ago • 6
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws Paper • 2605.21803 • Published May 20 • 5
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published May 18 • 50
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 171
DiagramBank: A Large-scale Dataset of Diagram Design Exemplars with Paper Metadata for Retrieval-Augmented Generation Paper • 2604.20857 • Published Feb 28 • 3