Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/5517
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Bhatia, Vimal | en_US |
dc.date.accessioned | 2022-03-17T01:00:00Z | - |
dc.date.accessioned | 2022-03-17T15:42:22Z | - |
dc.date.available | 2022-03-17T01:00:00Z | - |
dc.date.available | 2022-03-17T15:42:22Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Garg, N., Sellathurai, M., Bhatia, V., & Ratnarajah, T. (2021). Function approximation based reinforcement learning for edge caching in massive MIMO networks. IEEE Transactions on Communications, 69(4), 2304-2316. doi:10.1109/TCOMM.2020.3047658 | en_US |
dc.identifier.issn | 0090-6778 | - |
dc.identifier.other | EID(2-s2.0-85099104084) | - |
dc.identifier.uri | https://doi.org/10.1109/TCOMM.2020.3047658 | - |
dc.identifier.uri | https://dspace.iiti.ac.in/handle/123456789/5517 | - |
dc.description.abstract | Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes. © 1972-2012 IEEE. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
dc.source | IEEE Transactions on Communications | en_US |
dc.subject | Markov processes | en_US |
dc.subject | MIMO systems | en_US |
dc.subject | Multiplexing equipment | en_US |
dc.subject | Content popularities | en_US |
dc.subject | Function approximation | en_US |
dc.subject | Multi input multi output systems | en_US |
dc.subject | Placement strategy | en_US |
dc.subject | Poisson point process | en_US |
dc.subject | Q-learning approach | en_US |
dc.subject | Update requirement | en_US |
dc.subject | Wireless communication system | en_US |
dc.subject | Reinforcement learning | en_US |
dc.title | Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks | en_US |
dc.type | Journal Article | en_US |
dc.rights.license | All Open Access, Bronze, Green | - |
Appears in Collections: | Department of Electrical Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: