Academic Torrents: Scalable Data Distribution
Abstract
As competitions get more popular, transferring ever-larger data sets becomes infeasible and costly. For example, downloading the 157.3 GB 2012 ImageNet data set incurs about $4.33 in bandwidth costs per download. Downloading the full ImageNet data set takes 33 days. ImageNet has since become popular beyond the competition, and many papers and models now revolve around this data set. For sharing such an important resource to the machine learning community, the sharers of ImageNet must shoulder a large bandwidth burden. Academic Torrents reduces this burden for disseminating competition data, and also increases download speeds for end users. Academic Torrents is run by a pending nonprofit.. By augmenting an existing HTTP server with a peer-to-peer swarm, requests get re-routed to get data from downloaders. While existing systems slow down with more users, the benefits of Academic Torrents grow, with noticeable effects even when only one other person is downloading.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2016
- DOI:
- 10.48550/arXiv.1603.04395
- arXiv:
- arXiv:1603.04395
- Bibcode:
- 2016arXiv160304395L
- Keywords:
-
- Computer Science - Networking and Internet Architecture;
- Computer Science - Computers and Society;
- Computer Science - Digital Libraries
- E-Print:
- Presented at Neural Information Processing Systems 2015 Challenges in Machine Learning (CiML) workshop http://ciml.chalearn.org/home/schedule