Average Reward Reinforcement Learning
Project Description
Average-reward reinforcement learning studies sequential decision-making problems where the objective is to maximize the long-run average reward per time step. While discounted and finite-horizon RL have been extensively studied, both the theoretical understanding and practical development of average-reward RL remain relatively limited. Nevertheless, average-reward formulations are particularly important in practice for modeling continuing tasks without a natural endpoint, such as queueing systems, inventory control, and online platforms. In this project, students will learn the necessary background in average-reward RL, study its statistical properties from a theoretical perspective, and develop scalable algorithms for average-reward settings through numerical experiments.
Supervisor
SI, Nian
Quota
1
Course type
UROP1000
Applicant's Roles
Read relevant literature and become familiar with the theoretical background of the topic.
Investigate the statistical complexity of the proposed methods.
Implement and evaluate algorithms through numerical experiments.
Investigate the statistical complexity of the proposed methods.
Implement and evaluate algorithms through numerical experiments.
Applicant's Learning Objectives
Understand the fundamentals of average-reward reinforcement learning and its distinction from discounted and finite-horizon settings.
Develop a solid grasp of the theoretical foundations, including convergence and statistical properties.
Gain hands-on experience in implementing and evaluating scalable RL algorithms.
Develop a solid grasp of the theoretical foundations, including convergence and statistical properties.
Gain hands-on experience in implementing and evaluating scalable RL algorithms.
Complexity of the project
Challenging