We describe by example how to optimize cloud-computing resources offered by Amazon Web Services (AWS) to create and curate new datasets at scale. We are producing a co-registered atlas of the Galactic Plane at 16 wavelengths from 1 micron to 24 microns with a spatial sampling of 1 arcsec. The atlas is being created by using the Montage mosaic engine to generate co-registered mosaics of images released by the major surveys WISE, 2MASS, ADASS, GLIMPSE and MIPSGAL. The Atlas, when complete, will be 45 TB in size, composed of over 9,600 5 deg x 5 deg tiles with one degree overlap between them. The dataset will be housed on Amazon S3, designed for at-scale storage with access via web protocols. It will be publicly accessible through an API that will support access to the data and creation of cutouts according to the users’ specifications. The processing, which is estimated to require 340,000 compute hours for completion, has exploited virtual clusters created and managed on AWS platforms through the Pegasus workflow management system. We will describe the optimization methods, compute time and processing costs, as a guide for others wishing to exploit cloud platforms for processing and data creation.
American Astronomical Society Meeting Abstracts #223
- Pub Date:
- January 2014