Machine Learning Techniques for Decision Support in Intelligent Data Management
Abstract
NASA's growth in remote sensing data volumes has kept pace with Moore's Law, i.e., doubling every 18 months, with future growth likely from new instruments. Also, advances in instrumental design (e.g., hyperspectral scanners) and science algorithms are enabling more near-real-time applications of the data. The confluence of low-latency requirements with high data volumes and numbers of files poses major challenges for archive data management. In order to make the right data available at the right time, an archive will need to apply knowledge of the data content in its data management decisions. This decision support domain includes aspects such as automatic quality assessment, feature detection to support caching decisions, and content-based metadata to support efficient data selection. In this study, we evaluate a variety of machine learning algorithms for use in several decision support roles in intelligent data management. Machine learning algorithms such as neural networks and clustering have been used for decision support in business and policy domains. These techniques have found some use in remote sensing, e.g., for cloud and land cover classification. Yet most research on remote sensing data rests on science-based algorithms, such as those based on radiative transfer equations. Machine learning for scientific applications faces challenges such as discretization constraints, non-physical basis, and the difficulty of assembling training sets. However, these difficulties may be less significant in the decision support role. For instance, it is often enough to know whether a data attribute exceeds a certain threshold when selecting it for an application, without knowing the exact value. The training data problem can be surmounted by using products output by the science-based algorithms. On the other hand, an advantage of machine learning algorithms for decision support is their speed once they have been trained. Data management decisions must be made while the "fresh" data are still on disk, and in time to service near-real-time applications, i.e., within a few hours or even minutes. The difficulties and advantages of machine learning algorithms are examined for their utility in decisions regarding data quality assessment, feature-based caching strategies and content-based data selection.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2002
- Bibcode:
- 2002AGUFM.B61A0705L
- Keywords:
-
- 1640 Remote sensing;
- 1694 Instruments and techniques