Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/12894
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Vijesh, Antony | en_US |
dc.contributor.author | Sumithra Rudresha, Shreyas | en_US |
dc.date.accessioned | 2023-12-22T09:18:52Z | - |
dc.date.available | 2023-12-22T09:18:52Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Antony Vijesh, V., Sumithra Rudresha, S., & Abdulla, M. S. (2023). A Note on Generalized Second-Order Value Iteration in Markov Decision Processes. Journal of Optimization Theory and Applications. Scopus. https://doi.org/10.1007/s10957-023-02309-x | en_US |
dc.identifier.issn | 0022-3239 | - |
dc.identifier.other | EID(2-s2.0-85175966342) | - |
dc.identifier.uri | https://doi.org/10.1007/s10957-023-02309-x | - |
dc.identifier.uri | https://dspace.iiti.ac.in/handle/123456789/12894 | - |
dc.description.abstract | Value iteration is one of the first-order algorithms to approximate the solution of the Bellman equation arising from the Markov Decision Process (MDP). In recent literature, by approximating the max operator in the Bellman equation by a smooth function, an interesting second-order iterative method was discussed to solve the new Bellman equation. During the numerical simulation, it was observed that this second-order method is computationally expensive for a reasonable size of state and action. This second-order iterative method also faces difficulty in numerical implementation due to the calculation of an exponential function for larger values. In this manuscript, a few first-order iterative schemes have been derived from the second-order method to overcome the above practical problems. All the proposed iterative schemes possess the global convergence property. The proposed iterative schemes take less time to converge to the solution of the Bellman equation than the second-order method in many cases. These algorithms are efficient and easy to implement. An interesting theoretical comparison is provided between the algorithms. Numerical simulation supports our theoretical results. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Springer | en_US |
dc.source | Journal of Optimization Theory and Applications | en_US |
dc.subject | Markov decision processes | en_US |
dc.subject | Q-learning | en_US |
dc.subject | Reinforcement learning | en_US |
dc.subject | Value iteration | en_US |
dc.title | A Note on Generalized Second-Order Value Iteration in Markov Decision Processes | en_US |
dc.type | Journal Article | en_US |
Appears in Collections: | Department of Mathematics |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: