Xudong Wu

吴煦东

Ph.D. Student in Reinforcement Learning
The University of Hong Kong

wu.xudong [at] connect.hku.hk xudongwu02 [at] gmail.com
Xudong Wu

About Me

I am a Ph.D. student (starting Fall 2025) at The University of Hong Kong, where I focus on the theoretical foundations and algorithmic development of reinforcement learning (RL) for Embodied AI and Large Language Models (LLMs).

I completed my undergraduate studies in Mathematics and Statistics at The University of Edinburgh, graduating with First-Class Honours. My academic background encompasses statistical learning theory, optimization, and applied probability, with a strong emphasis on machine learning methodologies.

Before that, I studied Information and Computational Science at Dalian University of Technology, where I built a solid foundation in mathematical modeling, computational methods, and probability theory.

News

Sep 2025 Starting Ph.D. at The University of Hong Kong, advised by Prof. Jiayu Chen.
Jul 2025 Graduated from University of Edinburgh with First-Class Honours in Mathematics and Statistics.
Mar 2025 Completed research on Dynamic Self-Rewarding for Medical LLMs.
Apr 2025 Honours dissertation on Simulation-Based Inference completed — code released on GitHub.

Selected Research

Highlights from my research experience. View all →

LLM Alignment

Dynamic Self-Rewarding for Medical Large Language Models

University of Edinburgh · Research Collaborator · Jan–Mar 2025

Developed a dynamic self-rewarding framework for aligning medical LLMs without human-annotated supervision. Integrated a two-tier judge system with ChatGPT-4o and executed multi-round DPO for adaptive reward modeling on domain-specific medical datasets.

LLM Alignment DPO Medical AI
Bayesian Inference

A Comparative Study of Simulation-Based Inference Algorithms

University of Edinburgh · Honours Dissertation · Aug 2024 – Apr 2025 · Supervisor: Dr. Amanda Lenzi

Benchmarked three SBI algorithms — BayesFlow, SNL, and Affine Flow Matching — on synthetic and real-world inference tasks. Demonstrated AFM's superiority in capturing spatial structure in high-dimensional Poisson–CAR disease mapping models.

Education

Sep 2025 – Present

The University of Hong Kong

Ph.D. Student

Research: Reinforcement Learning, LLMs, Embodied AI

Advisor: Prof. Jiayu Chen · Co-advisors: Prof. Vaneet Aggarwal (Purdue), Prof. Wenjie Huang (HKU)

Sep 2023 – Jul 2025

University of Edinburgh

BSc (Hons) in Mathematics and Statistics

First-Class Honours (Equivalent to 4.0/4.0 GPA)

Sep 2021 – Jun 2023

Dalian University of Technology

BSc in Information and Computing Science

Average Score: 89.9/100