DROP Computing: Data Driven Pipeline Processing for the SKA
Abstract
The correlator output of the SKA arrays will be of the order of 1 TB/s. That data rate will have to be processed by the Science Data Processor using dedicated HPC infrastructure in both Australia and South Africa. Radio astronomical processing in principle is thought to be highly data parallel, with little to no communication required between individual tasks. Together with the ever increasing number of cores (CPUs) and stream processors (GPUs) this led us to step back and think about the traditional pipeline and task driven approach on a more fundamental level. We have thus started to look into dataflow representations (Dennis & Misunas 1974) and data flow programming models (Davis 1978) as well as data flow languages (Johnston et al. 2004) and scheduling (Benoit et al. 2014) . We have investigated a number of existing systems and prototyped some implementations using simplified, but real radio astronomy workflows. Despite the fact that many of these approaches are already focussing on data and dataflow as the most critical component, we still missed a rigorously data driven approach, where the data itself is essentially driving the whole process. In this talk we will present the new concept of DROP Computing (condensed data cloud), which is an integral part of the current SKA Data Layer architecture. In short a DROP is an abstract class, instances of which represent data (DataDrop), collections of DROPs (ContainerDrop), but also applications (ApplicationDrop, e.g. pipeline components). The rest are just details, which will be presented in the talk.
- Publication:
-
Astronomical Data Analysis Software and Systems XXV
- Pub Date:
- December 2017
- Bibcode:
- 2017ASPC..512..319W