In-N-On
Scaling Egocentric manipulation with in-the-wild and on-task data
Xiongyi Cai*, Ri-Zhao Qiu*†, Geng Chen, Lai Wei, Isabella Liu, Tianshu Huang, Xuxin Cheng,      Xiaolong Wang                                      (*:equal contribution   †:Project Lead) University of California, San Diego

In-N-On is a training recipe that uses egocentric human data by splitting it into in-the-wild and on-task, enabling zero-shot language following, few-shot learning, and robustness through targeted on-task data.

Abstract.
Egocentric videos are a valuable and scalable data source to learn manipulation policies. However, due to significant data heterogeneity, most existing approaches utilize human data for simple pre-training, which does not unlock its full potential. This paper provides a recipe for collecting and using egocentric data by categorizing human data into two categories: in-the-wild and on-task. We first curate a dataset, PHSD, which contains over 1,000 hours of diverse in-the-wild egocentric data and over 20 hours of on-task data directly aligned to the target manipulation tasks. This enables learning a large egocentric language-conditioned flow matching policy, Human0. We further adopt domain adaptation techniques to align the gap between humans and humanoids. Empirically, we show Human0 achieves several novel properties, including language following of instructions from only human data, few-shot learning, and improved robustness using on-task data.
Egocentric Data.
Egocentric Data w/ keypoints.
Retargeting from Egocentric Data.
Real-World Demo.

1. Zero-shot Language Following.
2. Improved On-task Performance.
3. 1-shot learning.
4. Object generalization.
Approach.
teaser

Figure 1: Human0 leverages large-scale egocentric human-humanoid data from the PHSD dataset for pre-training and post-training, enabling strong instruction following on unseen tasks, few-shot execution, and improved on-task performance.

arch

Figure 2: Our two-stage training pipeline pre-trains on large-scale in-the-wild human and robot data and post-trains on task-aligned demonstrations, using a domain-adversarial discriminator to learn embodiment-invariant representations for effective human-to-robot transfer.

BibTeX
@article{InNOn2025,
          title     = {In-N-On: Scaling Egocentric manipulation with in-the-wild and on-task data},
          author    = {Cai, Xiongyi and Qiu, Ri-Zhao and Chen, Geng and Wei, Lai and Liu, Isabella and Huang, Tianshu and Cheng, Xuxin and Wang, Xiaolong},
          journal   = {},
          year      = {2025}
        }