Xudong Wu

吴煦东

Ph.D. Student in Reinforcement Learning and LLMs, University of Hong Kong

Previously:
BSc (Hons) in Mathematics & Statistics, University of Edinburgh
BSc in Information and Computing Science, Dalian University of Technology


Email: X.Wu-97 [at] sms.ed.ac.uk (academic) | xudongwu02 [at] gmail.com (personal)

📄 View My CV

About Me | Education | Research
Academic Background | Honors


About Me

           

I am a Ph.D. student (starting Fall 2025) at The University of Hong Kong, where I focus on the theoretical foundations and algorithmic development of reinforcement learning (RL) and multi-agent systems for Large Language Models (LLMs).

Research Interests

My research focuses on the algorithmic and structural foundations of intelligent decision-making systems. I am particularly interested in:

These interests are grounded in prior experience with simulation-based inference and statistical modeling, and are aimed at advancing the capabilities of autonomous multi-agent systems powered by large language models.

I completed my undergraduate studies in Mathematics and Statistics at The University of Edinburgh. My academic background encompassed statistical learning theory, optimization, and applied probability, with a strong emphasis on machine learning methodologies. I graduated with a first-class degree.

Before that, I studied Information and Computational Science at Dalian University of Technology, where I built a solid foundation in mathematical modeling, computational methods, and probability theory. My coursework and research performance consistently reflected strong academic engagement and technical proficiency.

Education

The University of Hong Kong, Hong Kong SAR (Starting Sep. 2025)

  • Ph.D. Student
  • Research areas: Reinforcement Learning (RL), Multi-Agent Systems, Large Language Models (LLMs)
  • Advisor: Prof. Jiayu Chen
  • Co-advisor: Prof. Vaneet Aggarwal, Purdue University

University of Edinburgh, Edinburgh, UK (Sep. 2023 – Jul.2025)

  • BSc (Hons) in Mathematics and Statistics
  • First-Class Honours (Equivalent to 4.0/4.0 GPA)

Dalian University of Technology, Dalian, China (Sep. 2021 – Jun. 2023)

  • BSc in Information and Computing Science
  • Average score: 89.2/100

Research Experience

University of Edinburgh, Edinburgh, UK (Jan. 2025 – Mar. 2025)

  • Position: Research Collaborator
  • Project: Dynamic Self-Rewarding for Medical Large Language Models (Med-LLMs)
  • Research Focus:
    • Developed a dynamic self-rewarding framework for aligning medical LLMs without human-annotated supervision
    • Integrated a two-tier judge system where ChatGPT-4o dynamically refines evaluation prompts to mitigate reward misspecification and scoring bias
    • Executed multi-round Direct Preference Optimization (DPO) to align model behavior through self-generated preference pairs and adaptive reward modeling
    • Fine-tuned and evaluated LLMs (Mistral-7B) on domain-specific datasets (HealthCareMagic, PubMedQA, MedMCQA), targeting empathy, factuality, and coherence
    • Conducted task-specific benchmarking and error analysis to uncover performance bottlenecks due to hallucination and distributional drift across iteration stages

University of Edinburgh, Edinburgh, UK (Aug. 2024 – Apr. 2025)

  • Position: Honours Dissertation / Final Year Project
  • Supervisor: Dr. Amanda Lenzi
  • Thesis Title: A Comparative Study of Simulation-Based Inference Algorithms
  • Research Focus:
    • Benchmarked three cutting-edge Simulation-Based Inference (SBI) algorithms—BayesFlow, Sequential Neural Likelihood (SNL), and Affine Flow Matching (AFM)—on both synthetic and structured real-world inference tasks
    • Demonstrated that AFM outperforms amortized and sequential methods in capturing spatial structure in high-dimensional Poisson–CAR disease mapping models
    • Designed a robust evaluation framework using recovery line metrics and ECDF-based posterior calibration to analyze estimation accuracy and uncertainty quantification
    • Identified a key trade-off in joint parameter inference: increasing dimensionality improves model expressiveness but amplifies uncertainty due to parameter interaction
    • Implemented a full end-to-end SBI workflow and published open-source code: github.com/xudongwu-0/SBI-comparison

University of California, Irvine, USA (June 2024 - Sep. 2024)

  • Position: Summer Research Assistant
  • Supervisor: Prof. Chen Li (IEEE Fellow)
  • Project: Optimizing Texera, a machine learning-based data analysis workflow platform
  • Responsibilities:
    • Integrated AI-driven automation for workflow optimization, enabling seamless machine learning pipeline execution
    • Developed an automated report generation system that converts data analysis workflows into structured insights
    • Enhanced Texera’s data cleaning and visualization capabilities to improve model interpretability

University of Edinburgh, Edinburgh, UK (Feb. 2024 - May 2024)

  • Position: Research Assistant in Uncertainty Economics
  • Research Focus:
    • Analyzed the economic impact of uncertainty during the COVID-19 pandemic using stochastic models
    • Implemented Monte Carlo simulations and Bayesian inference techniques for probabilistic estimation
    • Developed computational tools for visualizing uncertainty in economic forecasting

University of Edinburgh, Edinburgh, UK (Sep. 2023 - Dec. 2023)

  • Position: Project Leader in Differential Equations
  • Research Focus:
    • Developed a system of ordinary differential equations (ODEs) to model bacterial infections and antibiotic treatments
    • Used Fourier series analysis and Laplace transforms to predict bacterial resistance patterns
    • Optimized drug treatment schedules using numerical simulations

Dalian University of Technology, Dalian, China (Dec. 2022)

  • Position: Research Project in Mathematical Modeling
  • Research Focus:
    • Developed a population dynamics model incorporating fertility rate adjustments
    • Improved predictive accuracy by incorporating age-stratified birth rate variations
    • Performed parameter sensitivity analysis to optimize demographic forecasting

Academic Background

University of Edinburgh, Edinburgh, UK (Sep. 2023 - Jun. 2025)

  • BSc in Mathematics and Statistics, First-Class Honours
  • Mathematical & Optimization Courses:
    • Numerical Ordinary Differential Equations – Stability & convergence analysis, applied to RL optimization
    • Honours Differential Equations – Applied in stochastic dynamic programming & Bellman equations
    • Honours Complex Variables – Conformal mapping & contour integration, useful for stochastic control
    • Honours Analysis – Functional analysis and variational methods, applicable to policy optimization
  • Statistics & Probabilistic Modelling:
    • Financial Mathematics – Stochastic processes, Ito calculus, and Black-Scholes model
    • Stochastic Modelling – Markov chains, Poisson processes, Brownian motion, key concepts for RL
    • Statistical Computing – Monte Carlo methods, MCMC, and Bayesian inference
    • Statistical Methodology – Hypothesis testing, regression, and statistical decision theory
  • Programming & Computational Methods:
    • Applied Statistics – Hands-on experience with real-world data, Bayesian updating in ML
    • Python, C++, MATLAB – Numerical computing, algorithm development for RL applications

Dalian University of Technology, Dalian, China (Sep. 2021 - Jun. 2023)

  • BSc in Information and Computing Science, GPA: 89.2/100
  • Mathematical & Theoretical Foundations:
    • Mathematical Analysis (1-3) – Real & complex analysis, measure theory fundamentals
    • Ordinary Differential Equations – Theoretical and numerical solutions to dynamic systems
    • Abstract Algebra – Group theory, rings, and fields, applicable in cryptography & optimization
    • Complex Function Theory – Applications in function approximation & conformal mapping
  • Probability & Statistical Modelling:
    • Probability & Mathematical Statistics – Statistical inference, Bayesian learning, decision theory
    • Real Analysis – Measure theory and Lebesgue integration, foundation for stochastic processes
  • Computational & Algorithmic Skills:
    • C++ Programming – Algorithm design, object-oriented programming
    • Python Programming – Data structures, numerical computation, scientific computing
    • Mathematical Modelling – Applied techniques in RL-based decision-making and optimization

Honors and Awards

  • First-Class Scholarship - Dalian University of Technology, 2022-2023 (Top 5%)
  • Dual Degree Student Scholarship - University of Edinburgh, 2022-2023
  • International Study Scholarship - Dalian University of Technology, 2022-2023

Last update: 22 May 2025