The Earthscope USArray Array Network Facility (ANF): Evolution of Data Acquisition, Processing, and Storage Systems
Abstract
Since April 2004 the Earthscope USArray Transportable Array (TA) network has grown to over 400 broadband seismic stations that stream multi-channel data in near real-time to the Array Network Facility in San Diego. In total, over 1.7 terabytes per year of 24-bit, 40 samples-per-second seismic and state of health data is recorded from the stations. The ANF provides analysts access to real-time and archived data, as well as state-of-health data, metadata, and interactive tools for station engineers and the public via a website. Additional processing and recovery of missing data from on-site recorders (balers) at the stations is performed before the final data is transmitted to the IRIS Data Management Center (DMC). Assembly of the final data set requires additional storage and processing capabilities to combine the real-time data with baler data. The infrastructure supporting these diverse computational and storage needs currently consists of twelve virtualized Sun Solaris Zones executing on nine physical server systems. The servers are protected against failure by redundant power, storage, and networking connections. Storage needs are provided by a hybrid iSCSI and Fiber Channel Storage Area Network (SAN) with access to over 40 terabytes of RAID 5 and 6 storage. Processing tasks are assigned to systems based on parallelization and floating-point calculation needs. On-site buffering at the data-loggers provide protection in case of short-term network or hardware problems, while backup acquisition systems at the San Diego Supercomputer Center and the DMC protect against catastrophic failure of the primary site. Configuration management and monitoring of these systems is accomplished with open-source (Cfengine, Nagios, Solaris Community Software) and commercial tools (Intermapper). In the evolution from a single server to multiple virtualized server instances, Sun Cluster software was evaluated and found to be unstable in our environment. Shared filesystem architectures using PxFS and QFS were found to be incompatible with our software architecture, so sharing of data between systems is accomplished via traditional NFS. Linux was found to be limited in terms of deployment flexibility and consistency between versions. Despite the experimentation with various technologies, our current virtualized architecture is stable to the point of an average daily real time data return rate of 92.34% over the entire lifetime of the project to date.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2009
- Bibcode:
- 2009AGUFMIN41A1114D
- Keywords:
-
- 1908 INFORMATICS / Cyberinfrastructure;
- 1912 INFORMATICS / Data management;
- preservation;
- rescue;
- 1998 INFORMATICS / Workflow