Ilia Larchenko PRO

IliaLarchenko

AI & ML interests

I am a Data Science Director with a diverse technical and business background. I live in Bangkok and work in Agoda, where I lead multiple DS and ML teams. I was an active Kaggler in TOP-20 of a global competition ranking, Competitions and Notebooks Master

Recent Activity

posted an update about 3 hours ago

I placed 🥈 2nd in the LeHome Challenge (ICRA 2026), and 🥇 1st of 62 teams in the first simulation round. Now I'm open-sourcing the full solution — code, tech report, and final weights. The task: teach a cheap two-armed robot (SO-ARM101) to fold 4 garment types — long/short tops and pants. Garment category is hidden at eval. Round 1 in sim (auto-scored), round 2 on a real robot (jury-scored). I trained a VLA policy with an RL loop on top. The key ideas: 🧠 The policy is its own value function. From the same forward pass that picks the next action chunk, cheap heads predict success probability, task completion %, garment type, and future keypoint distances + a Q-residual. Those become the advantage signal for RL — no separate critic. 🔁 A fully asynchronous RL loop coordinated only through the HF Hub: 1 trainer (H200) ships a fresh checkpoint ~every 40 min while N rollout workers (and a human doing teleop / DAgger corrections) collect data in parallel. Nobody waits — it uses the off-policy nature of the loop to the fullest. 📈 Binary success is too sparse, so I densify it into per-frame advantage via GAE — from objective keypoint checkpoints, the success-probability value baseline, and completion %. 🎛️ The RL combines AWR + RECAP. I also tune the inference knobs — execution length, playback speed, inpainting overlap, CFG scale, best-of-N — with a per-parameter Thompson-sampling bandit folded into rollout collection. 🔧 Round 2: with only ~1 week and no access to the eval robot — so the pipeline was sim → my robot → their robot, leaning on heavy augmentation to make the policy more robust. 📝 Blog: https://ilialarchenko.com/projects/lehome2026 📄 Tech report: https://huggingface.co/papers/2606.27163 🔧 Code: https://github.com/IliaLarchenko/lehome_solution 🤗 Sim policy: https://huggingface.co/IliaLarchenko/lehome_sim 🤗 Real policy: https://huggingface.co/IliaLarchenko/lehome_real 🌐 Challenge: https://lehome-challenge.com/

updated a model about 3 hours ago

IliaLarchenko/lehome_sim

updated a model about 3 hours ago

IliaLarchenko/lehome_real

View all activity

Organizations

Posts 3

Post

I placed 🥈 2nd in the LeHome Challenge (ICRA 2026), and 🥇 1st of 62 teams in the first simulation round. Now I'm open-sourcing the full solution — code, tech report, and final weights.

The task: teach a cheap two-armed robot (SO-ARM101) to fold 4 garment types — long/short tops and pants. Garment category is hidden at eval. Round 1 in sim (auto-scored), round 2 on a real robot (jury-scored).

I trained a VLA policy with an RL loop on top. The key ideas:

🧠 The policy is its own value function. From the same forward pass that picks the next action chunk, cheap heads predict success probability, task completion %, garment type, and future keypoint distances + a Q-residual. Those become the advantage signal for RL — no separate critic.

🔁 A fully asynchronous RL loop coordinated only through the HF Hub: 1 trainer (H200) ships a fresh checkpoint ~every 40 min while N rollout workers (and a human doing teleop / DAgger corrections) collect data in parallel. Nobody waits — it uses the off-policy nature of the loop to the fullest.

📈 Binary success is too sparse, so I densify it into per-frame advantage via GAE — from objective keypoint checkpoints, the success-probability value baseline, and completion %.

🎛️ The RL combines AWR + RECAP. I also tune the inference knobs — execution length, playback speed, inpainting overlap, CFG scale, best-of-N — with a per-parameter Thompson-sampling bandit folded into rollout collection.

🔧 Round 2: with only ~1 week and no access to the eval robot — so the pipeline was sim → my robot → their robot, leaning on heavy augmentation to make the policy more robust.

📝 Blog: https://ilialarchenko.com/projects/lehome2026
📄 Tech report: Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline) (2606.27163)
🔧 Code: https://github.com/IliaLarchenko/lehome_solution
🤗 Sim policy: IliaLarchenko/lehome_sim
🤗 Real policy: IliaLarchenko/lehome_real
🌐 Challenge: https://lehome-challenge.com/

Post

1600

🏆 BEHAVIOR Challenge 1st Place – Solution Summary

My team recently won 1st place in the BEHAVIOR Challenge at NeurIPS.
The competition focused on training a single policy to complete 50 long-horizon household tasks in simulation.

We built an end-to-end policy based on Pi0.5 with a bunch of custom modifications. Everything is open-sourced, and it should be useful for anyone exploring VLAs or adapting them to specific tasks.

Key Architecture Changes:
- Replaced language model with 50 trainable task embeddings (no text at all)
- Correlated noise for Flow Matching: ϵ ∼ N(0, 0.5I + 0.5Σ) using dataset action covariance
- Learnable mixed-layer attention: each action expert layer attends to a trainable mix of all VLM layers
- System 2 stage tracking: model predicts task stage, we smooth it with voting and feed it back as context

Training:
- Multi-sample Flow Matching: 15 FM samples per VLM pass to reduce gradient variance
- Delta action space + per-timestamp normalization
- FAST auxiliary loss and stage prediction loss
- Trained on 224×224 RGB + proprioception only
- We use 4 fine-tuned checkpoints, all derived from a multi-task model trained on all 50 tasks

Inference Optimizations:
- Soft inpainting: predict 30 actions, execute 26, use 4 as an input for the next chunk
- Correlation-aware guidance of inpainting to keep action chunks smooth
- 1.3× speedup via cubic spline compression
- General correction rule: reopen gripper after failed grasps

🔗 Code and Models:
- Code: https://github.com/IliaLarchenko/behavior-1k-solution
- Weights: IliaLarchenko/behavior_submission
- Paper: Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge (2512.06951)

View all Posts

Interview With AI

📚

Mock tech interview with AI.

Albumentations Demo

🏢

Optimize image augmentations with Albumentations

models 9

datasets 2

IliaLarchenko/behavior_224_rgb

Preview • Updated Dec 9, 2025 • 30.9k • 2

IliaLarchenko/vla_demo

Viewer • Updated Sep 1, 2025 • 165 • 282 • 9

Ilia Larchenko PRO

AI & ML interests

Recent Activity

Organizations

Posts 3

Collections 2

IliaLarchenko/lehome_real

IliaLarchenko/lehome_sim

Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

IliaLarchenko/behavior_submission

IliaLarchenko/behavior_224_rgb

IliaLarchenko/behavior_50t_checkpoint

IliaLarchenko/lehome_real

IliaLarchenko/lehome_sim

Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

IliaLarchenko/behavior_submission

IliaLarchenko/behavior_224_rgb

IliaLarchenko/behavior_50t_checkpoint

Papers 1

spaces 2

Interview With AI

Albumentations Demo

models 9

IliaLarchenko/lehome_sim

IliaLarchenko/lehome_real

IliaLarchenko/behavior_submission

IliaLarchenko/behavior_50t_checkpoint

IliaLarchenko/dot_transfer_cube

IliaLarchenko/dot_bimanual_insert

IliaLarchenko/dot_pusht_images

IliaLarchenko/dot_pusht_keypoints_best

IliaLarchenko/dot_pusht_keypoints

datasets 2

IliaLarchenko/behavior_224_rgb

IliaLarchenko/vla_demo

Ilia Larchenko PRO

AI & ML interests

Recent Activity

Organizations

Posts 3

Collections 2

Papers 1

spaces 2 Sort: Recently updated

Interview With AI

Albumentations Demo

models 9 Sort: Recently updated

datasets 2 Sort: Recently updated

spaces 2

models 9

datasets 2