Grid-based Infrastructure and Distributed Data Mining for Virtual Observatories
Abstract
Data access as well as analysis of geographically distributed data sets are challenges common to a wide variety of fields. To address this problem, we have been working on the development of two pieces of technology: a grid-based software called IDDAT that supports processing and remote data analysis of widely distributed data and RemoteMiner which is a parallel, distributed data mining software. IDDAT and RemoteMiner work seamlessly and provide the necessary backend functionalities hidden from the user. The user accesses the system through a single web portal where data selection is performed and data mining functions are planned. The data mining functions are prepared for execution by IDDat services. Preparation can include moving data to the processing location via services built over Storage Resource Broker (SRB), preprocessing data, and allocating computation and storage resources. IDDat services also initiate and monitor data mining functions and provide services to allow the results to be shared among other users. In this presentation, we illustrate a general user workflow and the provided functionalities. We will also provide an overview of the technical issues and design features such as storage scheduling, efficient network traffic management and resource selection.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2006
- Bibcode:
- 2006AGUFMSM21A0256K
- Keywords:
-
- 0520 Data analysis: algorithms and implementation;
- 0525 Data management;
- 0555 Neural networks;
- fuzzy logic;
- machine learning;
- 2100 INTERPLANETARY PHYSICS;
- 7500 SOLAR PHYSICS;
- ASTROPHYSICS;
- AND ASTRONOMY