A Note on Generalized Second-Order Value Iteration in Markov Decision Processes

Vijesh, Antony; Sumithra Rudresha, Shreyas

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/12894

Full metadata record

DC Field	Value	Language
dc.contributor.author	Vijesh, Antony	en_US
dc.contributor.author	Sumithra Rudresha, Shreyas	en_US
dc.date.accessioned	2023-12-22T09:18:52Z	-
dc.date.available	2023-12-22T09:18:52Z	-
dc.date.issued	2023	-
dc.identifier.citation	Antony Vijesh, V., Sumithra Rudresha, S., & Abdulla, M. S. (2023). A Note on Generalized Second-Order Value Iteration in Markov Decision Processes. Journal of Optimization Theory and Applications. Scopus. https://doi.org/10.1007/s10957-023-02309-x	en_US
dc.identifier.issn	0022-3239	-
dc.identifier.other	EID(2-s2.0-85175966342)	-
dc.identifier.uri	https://doi.org/10.1007/s10957-023-02309-x	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/12894	-
dc.description.abstract	Value iteration is one of the first-order algorithms to approximate the solution of the Bellman equation arising from the Markov Decision Process (MDP). In recent literature, by approximating the max operator in the Bellman equation by a smooth function, an interesting second-order iterative method was discussed to solve the new Bellman equation. During the numerical simulation, it was observed that this second-order method is computationally expensive for a reasonable size of state and action. This second-order iterative method also faces difficulty in numerical implementation due to the calculation of an exponential function for larger values. In this manuscript, a few first-order iterative schemes have been derived from the second-order method to overcome the above practical problems. All the proposed iterative schemes possess the global convergence property. The proposed iterative schemes take less time to converge to the solution of the Bellman equation than the second-order method in many cases. These algorithms are efficient and easy to implement. An interesting theoretical comparison is provided between the algorithms. Numerical simulation supports our theoretical results. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.source	Journal of Optimization Theory and Applications	en_US
dc.subject	Markov decision processes	en_US
dc.subject	Q-learning	en_US
dc.subject	Reinforcement learning	en_US
dc.subject	Value iteration	en_US
dc.title	A Note on Generalized Second-Order Value Iteration in Markov Decision Processes	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Mathematics

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: