ldwang

ftgreat

AI & ML interests

LLM, MLLM, Infra

Recent Activity

upvoted a collection 14 minutes ago

Nemotron-Post-Training-v3

upvoted an article about 17 hours ago

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

liked a Space 8 days ago

AdithyaSK/rl-environments-guide

View all activity

Organizations

upvoted a collection 14 minutes ago

Nemotron-Post-Training-v3

Collection

Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 51 items • Updated about 16 hours ago • 148

upvoted an article about 17 hours ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 160

liked a Space 8 days ago

The ultimate guide to RL environments: building and scaling them in the LLM era

📝

178

Building and scaling RL environments for LLM training

liked a dataset 18 days ago

open-thoughts/AgentTrove

Viewer • Updated 29 days ago • 1.7M • 11.7k • 182

liked a model 18 days ago

Zyphra/ZAYA1-VL-8B

Image-Text-to-Text • 10B • Updated 18 days ago • 1.98k • 39

liked a model 21 days ago

TaipingQu/BAAI-Cardiac-Agent

Updated Apr 7 • 40 • 6

liked a model 24 days ago

Zyphra/ZAYA1-8B

9B • Updated 24 days ago • 163k • 568

liked a model 25 days ago

tencent/Hy-MT1.5-1.8B-2bit

Translation • 2B • Updated Apr 29 • 615 • 35

liked a model 27 days ago

Qwen/Qwen3.6-27B

Image-Text-to-Text • 28B • Updated Apr 24 • 5.44M • • 1.62k

updated a model 28 days ago

BAAI/OpenSeek-Mid-v1

Text Generation • 11B • Updated 23 days ago • 25 • 11

liked 3 models about 1 month ago

upvoted a collection about 2 months ago

Qwen3.6

Collection

4 items • Updated Apr 22 • 394

liked a model about 2 months ago

Qwen/Qwen3.5-9B

Image-Text-to-Text • 10B • Updated Mar 2 • 8.92M • • 1.52k

upvoted a paper about 2 months ago

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 109

liked a model about 2 months ago

MiniMaxAI/MiniMax-M2.7

Text Generation • 229B • Updated Apr 20 • 2.36M • • 1.19k

upvoted a paper about 2 months ago

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Paper • 2603.25562 • Published Mar 26 • 19

liked a dataset about 2 months ago

nvidia/Nemotron-SFT-OpenCode-v1

Preview • Updated Mar 23 • 2.95k • 49

liked a model 2 months ago

arcee-ai/Trinity-Large-TrueBase

Text Generation • 399B • Updated 7 days ago • 201 • 67

ldwang

AI & ML interests

Recent Activity

Organizations

ldwang's activity

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

The ultimate guide to RL environments: building and scaling them in the LLM era