A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)
Abstract
Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, these systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail linear work per broadcast message. AllConcur -- a leaderless approach -- reduces the work, by connecting the servers via a sparse resilient overlay network; yet, this resiliency entails redundancy, limiting the reduction of work. In this paper, we propose AllConcur+, an atomic broadcast algorithm that lifts this limitation: During intervals with no failures, it achieves minimal work by using a redundancy-free overlay network. When failures do occur, it automatically recovers by switching to a resilient overlay network. In our performance evaluation of non-failure scenarios, AllConcur+ achieves comparable throughput to AllGather -- a non-fault-tolerant distributed agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in terms of throughput and latency. Furthermore, our evaluation of failure scenarios shows that AllConcur+'s expected performance is robust with regard to occasional failures. Thus, for realistic use cases, leveraging redundancy-free distributed agreement during intervals with no failures improves performance significantly.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2017
- DOI:
- 10.48550/arXiv.1708.08309
- arXiv:
- arXiv:1708.08309
- Bibcode:
- 2017arXiv170808309P
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing
- E-Print:
- Overview: 25 pages, 6 sections, 3 appendices, 8 figures, 3 tables. Modifications from previous version (restructuring): Table 1 moved to Appendix B as Table 3