Average Reward Reinforcement Learning | Undergraduate Research Opportunities Program

Project Description

Average-reward reinforcement learning studies sequential decision-making problems where the objective is to maximize the long-run average reward per time step. While discounted and finite-horizon RL have been extensively studied, both the theoretical understanding and practical development of average-reward RL remain relatively limited. Nevertheless, average-reward formulations are particularly important in practice for modeling continuing tasks without a natural endpoint, such as queueing systems, inventory control, and online platforms. In this project, students will learn the necessary background in average-reward RL, study its statistical properties from a theoretical perspective, and develop scalable algorithms for average-reward settings through numerical experiments.

Supervisor

SI, Nian

Quota

1

Course type

UROP1000

Applicant's Roles

Read relevant literature and become familiar with the theoretical background of the topic.
Investigate the statistical complexity of the proposed methods.
Implement and evaluate algorithms through numerical experiments.

Applicant's Learning Objectives

Understand the fundamentals of average-reward reinforcement learning and its distinction from discounted and finite-horizon settings.
Develop a solid grasp of the theoretical foundations, including convergence and statistical properties.
Gain hands-on experience in implementing and evaluating scalable RL algorithms.

Complexity of the project

Challenging