Yuxiang Zhao

Computer Vision Researcher @ Alibaba Group
Reviewer for ICCV, ECCV, CVPR, AAAI, NeurIPS, ICML, ICLR
Research Area: Human-Centric Perception, Interaction and Generation
Email: zeusyux@gmail.com

Yuxiang Zhao

Biography

I'm Yuxiang Zhao (赵煜翔), currently a Computer Vision Researcher at Alibaba AMAP CV Lab, where I explore the cutting-edge frontiers of human-centric interactive virtual reality technologies. I graduated from Sun Yat-sen University in 2024. Previously, I worked at Baidu on controllable generation technologies for anime-style content. My research focuses on building scalable models for virtual humans, aiming to faithfully represent appearance, motion, and behavior in a unified manner, while achieving high realism, human-like intelligence and adaptive interaction capabilities. My ultimate vision is to build a virtual world where people can reunite with the loved ones they deeply miss.

News

  • 03/2026 Invited to serve as a reviewer for NeurIPS 2026.
  • 02/2026 Invited to serve as a reviewer for ICML 2026.
  • 01/2026 Invited talk on human mesh recovery at the AI Time, hosted by QingQi.
  • 11/2025 One paper on expressive human pose and shape estimation was accepted to AAAI 2026, see you in Singapore!
  • 08/2025 Invited to serve as a reviewer for AAAI 2026.
  • 07/2025 One paper on embodied artificial intelligence was accepted to ACMMM 2025, see you in Dublin!

Selected Research

BridgeNav
Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation
First Author. Technical Report.
Novel prior-free instruction-driven out-to-in embodied navigation with vision-centric framework.
ABot-N0
ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation
Alibaba Group.
Unified VLA foundation model for grand multi-task embodied navigation unification.
CoEvoer
A Simple Baseline for Expressive Human Animation
First Author. The 19th European Conference on Computer Vision.
A simple yet efficient and accurate method for character animation and talking head generation.
CoEvoer
CoEvoer: Collaborative Evolution Transformer for Upper-Body Expressive Human Pose and Shape Estimation
First Author. The 40th Annual AAAI Conference on Artificial Intelligence.
One-stage mesh recovery with explicit modeling of human body part interactions.
Cross Time Domain
Fine-Grained and Controllable Conditional Video Generation
Baidu Inc.
Mitigating blurring in video generation under first-last frame constraints.
Trajectory Prediction
Cross Time Domain Intention Interaction for Conditional Trajectory Prediction
First Author. The 33rd ACM International Conference on Multimedia.
Cross-time intention-interactive trajectory prediction for human-robot interaction.
Image Caption Generation
Personalized Image Prompt Generation and Correction
Meituan Inc.
Generates overall image descriptions and conceptual keywords.