Software architecture for adaptive in silico knowledge discovery and decision making based on big genomic data analytics
Abstract
During the last years leading scientists, researchers and analysts determine big data as revolution in scientific studies and one of the most challenging tendencies. The volume of stored genomic data has increased significantly. Main challenge in data analysis and knowledge discovery is to suggest efficient processing tools, methods and technologies. Software architecture for adaptive knowledge discovery based on big genomic data analytics is presented in this paper. The software architecture is comprised of layers for data integration and preprocessing, database/data warehouse server, data discovery engine, pattern evaluation and graphical user interface. The big genomic data architecture consists of data sources, storage, integration and preprocessing, real data stream, stream processing, analytical data store, analysis and reporting. An algorithm for prediction of breast cancer based on machine learning for processing and analysis of big genomic data and knowledge discovery with respect to personalized treatment is presented. The proposed algorithm for breast cancer classification is implemented by using Stochastic Dual Coordinate Ascent (SDCA) method and Wisconsin breast cancer database. Experimental results are presented and discussed. The purpose of the study is to apply the software architecture on big genomic data analytics by practical experiments for specific case study identifying regulatory genetic elements in sequenced genomes, and prediction of the type and malignancy of breast cancer. This will enable fast processing of clinical observations data and comparison with the available data accumulated so far in support of precision medicine.
- Publication:
-
45th International Conference on Application of Mathematics in Engineering and Economics (AMEE'19)
- Pub Date:
- November 2019
- DOI:
- 10.1063/1.5133586
- Bibcode:
- 2019AIPC.2172i0009G