Toward efficient architecture-independent algorithms for dynamic programs

Ganapathi, Pramod

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4614

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ganapathi, Pramod	en_US
dc.date.accessioned	2022-03-17T01:00:00Z	-
dc.date.accessioned	2022-03-17T15:34:58Z	-
dc.date.available	2022-03-17T01:00:00Z	-
dc.date.available	2022-03-17T15:34:58Z	-
dc.date.issued	2019	-
dc.identifier.citation	Javanmard, M. M., Ganapathi, P., Das, R., Ahmad, Z., Tschudi, S., & Chowdhury, R. (2019). Toward efficient architecture-independent algorithms for dynamic programs doi:10.1007/978-3-030-20656-7_8	en_US
dc.identifier.isbn	9783030206550	-
dc.identifier.issn	0302-9743	-
dc.identifier.other	EID(2-s2.0-85067508267)	-
dc.identifier.uri	https://doi.org/10.1007/978-3-030-20656-7_8	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/4614	-
dc.description.abstract	We argue that the recursive divide-and-conquer paradigm is highly suited for designing algorithms to run efficiently under both shared-memory (multi- and manycores) and distributed-memory settings. The depth-first recursive decomposition of tasks and data is known to allow computations with potentially high temporal locality, and automatic adaptivity when resource availability (e.g., available space in shared caches) changes during runtime. Higher data locality leads to better intra-node I/O and cache performance and lower inter-node communication complexity, which in turn can reduce running times and energy consumption. Indeed, we show that a class of grid-based parallel recursive divide-and-conquer algorithms (for dynamic programs) can be run with provably optimal or near-optimal performance bounds on fat cores (cache complexity), thin cores (data movements), and purely distributed-memory machines (communication complexity) without changing the algorithm’s basic structure. Two-way recursive divide-and-conquer algorithms are known for solving dynamic programming (DP) problems on shared-memory multicore machines. In this paper, we show how to extend them to run efficiently also on manycore GPUs and distributed-memory machines. Our GPU algorithms work efficiently even when the data is too large to fit into the host RAM. These are external-memory algorithms based on recursive r-way divide and conquer, where r (≥ 2) varies based on the current depth of the recursion. Our distributed-memory algorithms are also based on multi-way recursive divide and conquer that extends naturally inside each shared-memory multicore/manycore compute node. We show that these algorithms are work-optimal and have low latency and bandwidth bounds. We also report empirical results for our GPU and distribute memory algorithms. © Springer Nature Switzerland AG 2019.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer Verlag	en_US
dc.source	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.subject	Cache memory	en_US
dc.subject	Computational complexity	en_US
dc.subject	Dynamic programming	en_US
dc.subject	Energy utilization	en_US
dc.subject	Graphics processing unit	en_US
dc.subject	Multicore programming	en_US
dc.subject	Program processors	en_US
dc.subject	Random access storage	en_US
dc.subject	Communication efficiency	en_US
dc.subject	Distributed Memory	en_US
dc.subject	Distributed memory algorithms	en_US
dc.subject	Distributed memory machines	en_US
dc.subject	Divide-and-conquer algorithm	en_US
dc.subject	Exascale	en_US
dc.subject	External memory algorithms	en_US
dc.subject	Shared memory	en_US
dc.subject	Memory architecture	en_US
dc.title	Toward efficient architecture-independent algorithms for dynamic programs	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: