Jim White PRO
jimwhite
·
AI & ML interests
None yet
Organizations
RL
-
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper • 2512.17008 • Published • 11 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 234 -
ryokamoi/Qwen-2.5-7B-FoVer-PRM-old
Text Generation • 8B • Updated • 16 • 1 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 43
PUP
Verified Agents
Coding Benchmarks
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 306 -
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Paper • 2511.05459 • Published • 5 -
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Paper • 2512.18470 • Published • 12 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 128
Semantic Web
-
josancamon/kg-gen-MINE-evaluation-dataset
Viewer • Updated • 101 • 252 • 5 -
zilliz/semantic-highlight-bilingual-v1
Token Classification • 0.6B • Updated • 11.1k • 97 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 128
LLM
Verified Agents
RL
-
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper • 2512.17008 • Published • 11 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 234 -
ryokamoi/Qwen-2.5-7B-FoVer-PRM-old
Text Generation • 8B • Updated • 16 • 1 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 43
Coding Benchmarks
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 306 -
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Paper • 2511.05459 • Published • 5 -
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Paper • 2512.18470 • Published • 12 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 128
PUP
Semantic Web
-
josancamon/kg-gen-MINE-evaluation-dataset
Viewer • Updated • 101 • 252 • 5 -
zilliz/semantic-highlight-bilingual-v1
Token Classification • 0.6B • Updated • 11.1k • 97 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 128