Double Successive Over-Relaxation Q-Learning With an Extension to Deep Reinforcement Learning

Sumithra Rudresha, Shreyas

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/16432

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sumithra Rudresha, Shreyas	en_US
dc.date.accessioned	2025-07-09T13:48:02Z	-
dc.date.available	2025-07-09T13:48:02Z	-
dc.date.issued	2025	-
dc.identifier.citation	Shreyas, S. R. (2025). Double Successive Over-Relaxation Q-Learning With an Extension to Deep Reinforcement Learning. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2025.3576581	en_US
dc.identifier.issn	2162-237X	-
dc.identifier.other	EID(2-s2.0-105009055087)	-
dc.identifier.uri	https://dx.doi.org/10.1109/TNNLS.2025.3576581	-
dc.identifier.uri	https://dspace.iiti.ac.in:8080/jspui/handle/123456789/16432	-
dc.description.abstract	Q-learning (QL) is a widely used algorithm in reinforcement learning (RL), but its convergence can be slow, especially when the discount factor is close to one. Successive over-relaxation (SOR) QL, which introduces a relaxation factor to speed up convergence, addresses this issue but has two major limitations. In the tabular setting, the relaxation parameter depends on transition probability, making it not entirely model-free, and it suffers from overestimation bias. To overcome these limitations, we propose a sample-based, model-free double SORQL (MF-DSORQL) algorithm. Theoretically and empirically, this algorithm is shown to be less biased than SORQL. Furthermore, in the tabular setting, the convergence analysis under boundedness assumptions on iterates is discussed. The proposed algorithm is extended to large-scale problems using deep RL. Finally, both the tabular version of the proposed algorithm and its deep RL extension are tested on benchmark examples. © 2012 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	IEEE Transactions on Neural Networks and Learning Systems	en_US
dc.subject	Deep reinforcement learning (RL)	en_US
dc.subject	Markov decision processes (MDPs)	en_US
dc.subject	overestimation bias	en_US
dc.subject	successive over-relaxation (SOR)	en_US
dc.title	Double Successive Over-Relaxation Q-Learning With an Extension to Deep Reinforcement Learning	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Mathematics

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: