Leveraging Open Source Technologies to Build Scientific Data Systems
Abstract
Scientific discovery is largely a collaborative endeavor. From the design and execution of earth and planetary science missions to the evaluation of biomarkers that can identify a particular predisposition to cancer, the scientific community increasingly depends on multi-institutional collaboration as a key enabler of the discovery process. Science data systems, in turn, play a key role in enabling the data-driven collaborative investigations,. The utility of these systems can be greatly enhanced by applying many of the same principles governing scientific collaboration to the software development and deployment process. Rather than being built in isolation, scientific data systems must be developed using a collaborative model, both to ensure they can be run in multi-center deployments, and that they will support the full range of varying and evolving needs of the scientific community they target. Open source plays a vital role in enabling this process. By its very nature, open source allows software development to turn software projects into multi-institutional and international data systems by developing the communities around software product lines. At the Jet Propulsion Laboratory (JPL) we have been involved in developing a core software framework called "OODT" to support development of cross-disciplinary science data systems following an open source implementation. This framework has been applied to various areas in earth science including mission science data system development, climate research and data analysis as well as to planetary, lunar, astrophysics and biomedical research. In 2011, OODT became the first top-level project at the Apache Software Foundation (ASF) to be incubated at a NASA center. The experience in incubating and developing the OODT framework has been invaluable in shifting the development of science data systems towards a collaborative model. Rather than developing each system independently, there is substantial collaboration that is occurring between teams to ensure that high quality systems are being developed in cost effective ways. In addition, we have been careful to ensure that appropriate architectural boundaries have been observed to separate the common data management components in Apache OODT from the discipline specific requirements that need to be met in our software system deployments. Science, particularly Earth Science, is well positioned to embrace a collaborative software development model across institutions and agencies, both to share data and promote interoperability at a systems level, as well as to promote open source development for building and evolving fundamental software infrastructure components for data management. We believe that this paradigm shift is critical to improving both the capability and cost effectiveness of the systems, and that it will ultimately lead to improvements in software and data reuse.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2011
- Bibcode:
- 2011AGUFMIN21D..01C
- Keywords:
-
- 1908 INFORMATICS / Cyberinfrastructure;
- 1978 INFORMATICS / Software re-use