Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/12894
Title: A Note on Generalized Second-Order Value Iteration in Markov Decision Processes
Authors: Vijesh, Antony
Sumithra Rudresha, Shreyas
Keywords: Markov decision processes;Q-learning;Reinforcement learning;Value iteration
Issue Date: 2023
Publisher: Springer
Citation: Antony Vijesh, V., Sumithra Rudresha, S., & Abdulla, M. S. (2023). A Note on Generalized Second-Order Value Iteration in Markov Decision Processes. Journal of Optimization Theory and Applications. Scopus. https://doi.org/10.1007/s10957-023-02309-x
Abstract: Value iteration is one of the first-order algorithms to approximate the solution of the Bellman equation arising from the Markov Decision Process (MDP). In recent literature, by approximating the max operator in the Bellman equation by a smooth function, an interesting second-order iterative method was discussed to solve the new Bellman equation. During the numerical simulation, it was observed that this second-order method is computationally expensive for a reasonable size of state and action. This second-order iterative method also faces difficulty in numerical implementation due to the calculation of an exponential function for larger values. In this manuscript, a few first-order iterative schemes have been derived from the second-order method to overcome the above practical problems. All the proposed iterative schemes possess the global convergence property. The proposed iterative schemes take less time to converge to the solution of the Bellman equation than the second-order method in many cases. These algorithms are efficient and easy to implement. An interesting theoretical comparison is provided between the algorithms. Numerical simulation supports our theoretical results. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
URI: https://doi.org/10.1007/s10957-023-02309-x
https://dspace.iiti.ac.in/handle/123456789/12894
ISSN: 0022-3239
Type of Material: Journal Article
Appears in Collections:Department of Mathematics

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: