EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 8 days ago • 79
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation Paper • 2510.06303 • Published Oct 7, 2025 • 15
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Paper • 2504.00487 • Published Apr 1, 2025 • 18