openai/whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 7.96M • • 3.1k
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 23 days ago • 426
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published 25 days ago • 102
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows? Paper • 2605.16679 • Published May 15 • 54
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 29 days ago • 178
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design Paper • 2605.15871 • Published May 15 • 17
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 273
PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination Paper • 2605.03571 • Published May 5 • 7
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 170
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published Apr 15 • 62
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 122