Tapan Srivastava --- Penelope: Peer-to-peer Power Management

chacha · August 2, 2022, 2:19am

Tuesday 08/09

12:30 -13:30 pm

Speaker: Tapan Srivastava

Room: JCL 298

Abstract

Large scale distributed computing setups rely on power management systems to enforce tight power budgets. Existing systems use a central authority that redistributes excess power to power-hungry nodes. This central authority, however, is both a single point of failure and a critical bottleneck—especially at large scale. To address these limitations we propose Penelope, a distributed power management system which shifts power through peer-to-peer transactions, ensuring that it remains robust in faulty environments and at large scale. We implement Penelope and compare its achieved performance to SLURM, a centralized power manager, under a variety of power budgets. We find that under normal conditions SLURM and Penelope achieve almost equivalent performance; however in faulty environments, Penelope achieves 8–15% mean application performance gains over SLURM. At large scale and with increasing frequency of messages, Penelope maintains its performance in contrast to centralized approaches which degrade and become unusable.