ARCANE: Adaptive Routing with Caching and Network Exploration
Abstract
Most datacenter transport protocols traditionally depend on in-order packet delivery, a legacy design choice that prioritizes simplicity. However, technological advancements, such as RDMA, now enable the relaxation of this requirement, allowing for more efficient utilization of modern datacenter topologies like FatTree and Dragonfly. With the growing prevalence of AI/ML workloads, the demand for improved link utilization has intensified, creating challenges for single-path load balancers due to problems like ECMP collisions. In this paper, we present ARCANE, a novel, adaptive per-packet traffic load-balancing algorithm designed to work seamlessly with existing congestion control mechanisms. ARCANE dynamically routes packets to bypass congested areas and network failures, all while maintaining a lightweight footprint with minimal state requirements. Our evaluation shows that ARCANE delivers significant performance gains over traditional load-balancing methods, including packet spraying and other advanced solutions, substantially enhancing both performance and link utilization in modern datacenter networks.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- 10.48550/arXiv.2407.21625
- arXiv:
- arXiv:2407.21625
- Bibcode:
- 2024arXiv240721625B
- Keywords:
-
- Computer Science - Networking and Internet Architecture