-
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Paper • 2605.30611 • Published • 193 -
HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems
Paper • 2606.01779 • Published • 6 -
Text-to-Image Models Need Less from Text Encoders Than You Think
Paper • 2606.03715 • Published • 10 -
SIA: Self Improving AI with Harness & Weight Updates
Paper • 2605.27276 • Published • 14
JiayuCHEN
KN33SOXXX
AI & ML interests
None yet
Recent Activity
updated a dataset about 14 hours ago
knowledge-in-visual-synthesis/v1 updated a collection 2 days ago
benchmark updated a collection 2 days ago
benchmarkOrganizations
benchmark
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
Paper • 2605.25874 • Published • 102 -
Agents' Last Exam
Paper • 2606.05405 • Published • 342 -
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Paper • 2606.09426 • Published • 99 -
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Paper • 2606.07591 • Published • 88
harness
-
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Paper • 2605.30611 • Published • 193 -
HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems
Paper • 2606.01779 • Published • 6 -
Text-to-Image Models Need Less from Text Encoders Than You Think
Paper • 2606.03715 • Published • 10 -
SIA: Self Improving AI with Harness & Weight Updates
Paper • 2605.27276 • Published • 14
benchmark
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
Paper • 2605.25874 • Published • 102 -
Agents' Last Exam
Paper • 2606.05405 • Published • 342 -
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Paper • 2606.09426 • Published • 99 -
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Paper • 2606.07591 • Published • 88
models 0
None public yet