MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published about 1 month ago • 15
Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth Paper • 2605.25052 • Published May 24 • 14
DCAgent3/dev_set_v2_rl__24GPU_base_excl_timeouts__exp_rpt_pymethods2test_large__GLM_4_7_c2148a8d Viewer • Updated May 27 • 296 • 13 • 1
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering Paper • 2605.17526 • Published May 17 • 7
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published May 7 • 237
Forge-UGC: FX optimization and register-graph engine for universal graph compiler Paper • 2604.16498 • Published Apr 14 • 5
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 329