Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 14 days ago • 54
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 14 days ago • 228
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search Paper • 2605.20244 • Published 28 days ago • 4
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published 24 days ago • 80
bilabila/b-b7_olr_ts10_gru_hib_costdyn_util_w3_sym7_202601_lossq_ms400k_h12 68k • Updated 23 days ago • 383 • 1
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 25 days ago • 169
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes Paper • 2605.15843 • Published about 1 month ago • 6
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published May 14 • 77
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices Paper • 2605.10933 • Published May 11 • 3
Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts Paper • 2602.03473 • Published May 8 • 11