Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/17499
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorVijesh, Antony-
dc.contributor.authorShreyas SR-
dc.date.accessioned2025-12-22T08:18:24Z-
dc.date.available2025-12-22T08:18:24Z-
dc.date.issued2025-12-03-
dc.identifier.urihttps://dspace.iiti.ac.in:8080/jspui/handle/123456789/17499-
dc.description.abstractThis thesis focuses on the development of efficient, convergent algorithms for solving problems in dynamic programming, reinforcement learning, and multi-agent learning. The work begins with novel first-order iterative schemes derived from a computationally expensive second-order method that approximates the Bellman equation using smooth functions. These new schemes retain the global convergence property while being more computationally efficient and easier to implement. Next, the thesis proposes a Weighted Smooth Q-Learning (WSQL) algorithm to address overestimation and underestimation biases in Q-learning and double Q-learning, respectively. By incorporating a weighted blend of mellowmax and log-sum-exp operators, WSQL achieves stability and theoretical convergence guarantees. The third part of the thesis introduces off-policy two-step Q-learning algorithms—both standard and smooth variants—that improve convergence and robustness without relying on importance sampling. Finally, the thesis extends these techniques to the multi-agent setting, proposing a multi-step minimax Q-learning algorithm for solving two-player zero-sum Markov games. Theoretical analysis ensures boundedness and almost sure convergence of the algorithms under suitable assumptions. Across all contributions, the proposed methods are validated through comprehensive numerical experiments on benchmark problems, demonstrating their e!ectiveness, robustness, and practical utility. Keywords: Reinforcement Learning, Q-learning, Bellman Equation, Value Iteration, Two-Player Zero-Sum Games, Stochastic Approximation, Smooth Operators.en_US
dc.language.isoenen_US
dc.publisherDepartment of Mathematics, IIT Indoreen_US
dc.relation.ispartofseriesTH781;-
dc.subjectMathematicsen_US
dc.titleSequential decision making under uncertainty: efficient Q-learning frameworksen_US
dc.typeThesis_Ph.Den_US
Appears in Collections:Department of Mathematics_ETD

Files in This Item:
File Description SizeFormat 
TH_781_Shreyas_S_R_1901241006.pdf4.85 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: