The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC
Abstract
The CMS experiment at the CERN LHC developed the workflow management archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate
- Publication:
-
Computing and Software for Big Science
- Pub Date:
- November 2018
- DOI:
- 10.1007/s41781-018-0005-0
- arXiv:
- arXiv:1801.03872
- Bibcode:
- 2018CSBS....2....1K
- Keywords:
-
- BigData;
- LHC;
- Data management;
- High Energy Physics - Experiment;
- Computer Science - Digital Libraries
- E-Print:
- This is a pre-print of an article published in Computing and Software for Big Science. The final authenticated version is available online at: https://doi.org/10.1007/s41781-018-0005-0