HRM2Avatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans
Abstract
We present HRM2Avatar, a novel framework for creating high-fidelity avatars from monocular phone scans, which can be rendered and animated in real-time on mobile devices. Monocular capture with commodity smartphones provides a low-cost, pervasive alternative to studio-grade multi-camera rigs, making avatar digitization accessible to non-expert users. Reconstructing high-fidelity avatars from single-view video sequences poses significant challenges due to deficient visual and geometric data relative to multi-camera setups. To address these limitations, at the data level, our method leverages two types of data captured with smartphones: static pose sequences for detailed texture reconstruction and dynamic motion sequences for learning pose-dependent deformations and lighting changes. At the representation level, we employ a lightweight yet expressive representation to reconstruct high-fidelity digital humans from sparse monocular data. First, we extract explicit garment meshes from monocular data to model clothing deformations more effectively. Second, we attach illumination-aware Gaussians to the mesh surface, enabling high-fidelity rendering and capturing pose-dependent lighting changes. This representation efficiently learns high-resolution and dynamic information from our tailored monocular data, enabling the creation of detailed avatars. At the rendering level, real-time performance is critical for rendering and animating high-fidelity avatars in AR/VR, social gaming, and on-device creation, demanding sub-frame responsiveness. Our fully GPU-driven rendering pipeline delivers 120 FPS on mobile devices and 90 FPS on standalone VR devices at 2K resolution, over 2.7× faster than representative mobile-engine baselines. Experiments show that HRM2Avatar delivers superior visual realism and real-time interactivity at high resolutions, outperforming state-of-the-art monocular methods.
Video Presentation
BibTeX
@misc{shi2025hrm2avatarhighfidelityrealtimemobile,
title={HRM^2Avatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans},
author={Chao Shi and Shenghao Jia and Jinhui Liu and Yong Zhang and Liangchao Zhu and Zhonglei Yang and Jinze Ma and Chaoyue Niu and Chengfei Lv},
year={2025},
eprint={2510.13587},
archivePrefix={arXiv},
primaryClass={cs.GR},
doi={https://doi.org/10.1145/3757377.3763894},
url={https://arxiv.org/abs/2510.13587},
}