Post
7
I placed š„ 2nd in the LeHome Challenge (ICRA 2026), and š„ 1st of 62 teams in the first simulation round. Now I'm open-sourcing the full solution ā code, tech report, and final weights.
The task: teach a cheap two-armed robot (SO-ARM101) to fold 4 garment types ā long/short tops and pants. Garment category is hidden at eval. Round 1 in sim (auto-scored), round 2 on a real robot (jury-scored).
I trained a VLA policy with an RL loop on top. The key ideas:
š§ The policy is its own value function. From the same forward pass that picks the next action chunk, cheap heads predict success probability, task completion %, garment type, and future keypoint distances + a Q-residual. Those become the advantage signal for RL ā no separate critic.
š A fully asynchronous RL loop coordinated only through the HF Hub: 1 trainer (H200) ships a fresh checkpoint ~every 40 min while N rollout workers (and a human doing teleop / DAgger corrections) collect data in parallel. Nobody waits ā it uses the off-policy nature of the loop to the fullest.
š Binary success is too sparse, so I densify it into per-frame advantage via GAE ā from objective keypoint checkpoints, the success-probability value baseline, and completion %.
šļø The RL combines AWR + RECAP. I also tune the inference knobs ā execution length, playback speed, inpainting overlap, CFG scale, best-of-N ā with a per-parameter Thompson-sampling bandit folded into rollout collection.
š§ Round 2: with only ~1 week and no access to the eval robot ā so the pipeline was sim ā my robot ā their robot, leaning on heavy augmentation to make the policy more robust.
š Blog: https://ilialarchenko.com/projects/lehome2026
š Tech report: Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline) (2606.27163)
š§ Code: https://github.com/IliaLarchenko/lehome_solution
š¤ Sim policy: IliaLarchenko/lehome_sim
š¤ Real policy: IliaLarchenko/lehome_real
š Challenge: https://lehome-challenge.com/
The task: teach a cheap two-armed robot (SO-ARM101) to fold 4 garment types ā long/short tops and pants. Garment category is hidden at eval. Round 1 in sim (auto-scored), round 2 on a real robot (jury-scored).
I trained a VLA policy with an RL loop on top. The key ideas:
š§ The policy is its own value function. From the same forward pass that picks the next action chunk, cheap heads predict success probability, task completion %, garment type, and future keypoint distances + a Q-residual. Those become the advantage signal for RL ā no separate critic.
š A fully asynchronous RL loop coordinated only through the HF Hub: 1 trainer (H200) ships a fresh checkpoint ~every 40 min while N rollout workers (and a human doing teleop / DAgger corrections) collect data in parallel. Nobody waits ā it uses the off-policy nature of the loop to the fullest.
š Binary success is too sparse, so I densify it into per-frame advantage via GAE ā from objective keypoint checkpoints, the success-probability value baseline, and completion %.
šļø The RL combines AWR + RECAP. I also tune the inference knobs ā execution length, playback speed, inpainting overlap, CFG scale, best-of-N ā with a per-parameter Thompson-sampling bandit folded into rollout collection.
š§ Round 2: with only ~1 week and no access to the eval robot ā so the pipeline was sim ā my robot ā their robot, leaning on heavy augmentation to make the policy more robust.
š Blog: https://ilialarchenko.com/projects/lehome2026
š Tech report: Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline) (2606.27163)
š§ Code: https://github.com/IliaLarchenko/lehome_solution
š¤ Sim policy: IliaLarchenko/lehome_sim
š¤ Real policy: IliaLarchenko/lehome_real
š Challenge: https://lehome-challenge.com/