-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 119 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 364 • • 156 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 109 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 19
Malkesh Dalia
malkesh2911
·
AI & ML interests
None yet
Recent Activity
liked a model 9 days ago
zai-org/GLM-5.2 liked a model about 1 month ago
CohereLabs/command-a-plus-05-2026-w4a4 upvoted a paper about 1 month ago
Lance: Unified Multimodal Modeling by Multi-Task SynergyOrganizations
None yet