Yuxiang Zhao's Homepage

Yuxiang Zhao

Computer Vision Researcher @ Alibaba Group
Reviewer for ICCV, ECCV, CVPR, AAAI, NeurIPS, ICML, ICLR
Research Area: Human-Centric Perception, Interaction and Generation
Email: zeusyux@gmail.com

Biography

I'm Yuxiang Zhao (赵煜翔), currently a Computer Vision Researcher at Alibaba AMAP CV Lab, where I explore the cutting-edge frontiers of human-centric interactive virtual reality technologies. I graduated from Sun Yat-sen University in 2024. Previously, I worked at Baidu on controllable generation technologies for anime-style content. My research focuses on building scalable models for virtual humans, aiming to faithfully represent appearance, motion, and behavior in a unified manner, while achieving high realism, human-like intelligence and adaptive interaction capabilities. My ultimate vision is to build a virtual world where people can reunite with the loved ones they deeply miss.

News

03/2026 Invited to serve as a reviewer for NeurIPS 2026.
02/2026 Invited to serve as a reviewer for ICML 2026.
01/2026 Invited talk on human mesh recovery at the AI Time, hosted by QingQi.
11/2025 One paper on expressive human pose and shape estimation was accepted to AAAI 2026, see you in Singapore!
08/2025 Invited to serve as a reviewer for AAAI 2026.
07/2025 One paper on embodied artificial intelligence was accepted to ACMMM 2025, see you in Dublin!

Selected Research

	Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation First Author. Technical Report. Novel prior-free instruction-driven out-to-in embodied navigation with vision-centric framework.
	ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation Alibaba Group. Unified VLA foundation model for grand multi-task embodied navigation unification.
	A Simple Baseline for Expressive Human Animation First Author. The 19th European Conference on Computer Vision. A simple yet efficient and accurate method for character animation and talking head generation.
	CoEvoer: Collaborative Evolution Transformer for Upper-Body Expressive Human Pose and Shape Estimation First Author. The 40th Annual AAAI Conference on Artificial Intelligence. One-stage mesh recovery with explicit modeling of human body part interactions.
	Fine-Grained and Controllable Conditional Video Generation Baidu Inc. Mitigating blurring in video generation under first-last frame constraints.
	Cross Time Domain Intention Interaction for Conditional Trajectory Prediction First Author. The 33rd ACM International Conference on Multimedia. Cross-time intention-interactive trajectory prediction for human-robot interaction.
	Personalized Image Prompt Generation and Correction Meituan Inc. Generates overall image descriptions and conceptual keywords.