Studio e confronto delle strutture di Apache Spark

doi:10.48550/arXiv.1810.12059

Studio e confronto delle strutture di Apache Spark

Morrelli, Massimiliano

English. This document is designed to study the data structures that can be used in the Apache Spark framework and to evaluate the best performing ones to implement solutions, in particular we will evaluate advantages / disadvantages deriving from the use of Dataset for job creation. The observation of the results provides further support in evaluating the use of Dataset as an alternative to RDD, in order to understand its strengths and weaknesses. The examination of the results is possible thanks to specifically designed and implemented in Java 1.8 language. The execution of the jobs, entrusted to a suitable distributed environment, will end with the comparison between execution times and results obtained. Italiano. Il presente documento nasce allo scopo di studiare le strutture dati utilizzabili nel framework Apache Spark e valutare quelle più performanti per implementare soluzioni; valuteremo in articolare i vantaggi / svantaggi derivanti dall'utilizzo dei Dataset nella progettazione dei job. L'osservazione dei risultati fornisce ulteriore supporto nel valutare l'utilizzo dei Dataset in alternativa a RDD, al fine di comprederne i punti di forza e di debolezza. L'esame dei risultati è possibile in virtù di due casi appositamente pensati e implementati in linguaggio Java 1.8. L'esecuzione dei job, affidata a un adeguato ambiente distribuito, si concluderà con il confronto tra tempi di esecuzione e risultati ottenuti.

Publication:

arXiv e-prints

Pub Date:

October 2018

DOI:

10.48550/arXiv.1810.12059

arXiv:

arXiv:1810.12059

Bibcode:

2018arXiv181012059M

Keywords:

Computer Science - Databases

E-Print:

in Italian

NASA/ADS

Studio e confronto delle strutture di Apache Spark

Abstract