Chanuk Lee

tally0818

·

https://tally0818.github.io

AI & ML interests

LLM post-training

Recent Activity

upvoted a paper 1 day ago

DOPD: Dual On-policy Distillation

upvoted a paper 1 day ago

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

upvoted a paper 2 days ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

View all activity

Organizations

None yet

upvoted 2 papers 1 day ago

DOPD: Dual On-policy Distillation

Paper • 2606.30626 • Published 4 days ago • 88

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

Paper • 2606.29082 • Published 6 days ago • 26

upvoted a paper 2 days ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published Mar 19 • 70

upvoted a paper 14 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published 17 days ago • 63

upvoted 2 papers 20 days ago

On the Geometry of On-Policy Distillation

Paper • 2606.07082 • Published 28 days ago • 75

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 22 days ago • 109

upvoted a paper 27 days ago

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Paper • 2606.04743 • Published 30 days ago • 47

upvoted a paper 29 days ago

Trust Region On-Policy Distillation

Paper • 2606.01249 • Published May 31 • 46

upvoted 9 papers about 1 month ago

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Paper • 2605.29250 • Published May 28 • 79

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents

Paper • 2605.28775 • Published May 27 • 38

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Paper • 2605.28774 • Published May 27 • 93

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Paper • 2605.17873 • Published May 18 • 12

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Paper • 2605.10781 • Published May 11 • 17

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 196

The Unlearnability Phenomenon in RLVR for Language Models

Paper • 2605.16787 • Published May 16 • 6

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Paper • 2605.21468 • Published May 20 • 51

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Paper • 2605.20258 • Published May 18 • 30

authored a paper about 2 months ago

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

Paper • 2605.15726 • Published May 15 • 35

upvoted 2 papers about 2 months ago

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

Paper • 2605.15726 • Published May 15 • 35

PREPING: Building Agent Memory without Tasks

Paper • 2605.13880 • Published May 11 • 28