Data Mining Approach to Analyze Covid19 Dataset of Brazilian Patients
Abstract
The pandemic originated by coronavirus(covid-19), name coined by World Health Organization during the first month in 2020. Actually, almost all the countries presented covid19 positive cases and governments are choosing different health policies to stop the infection and many research groups are working on patients data to understand the virus, at the same time scientists are looking for a vacuum to enhance imnulogy system to tack covid19 virus. One of top countries with more infections is Brazil, until August 11 had a total of 3,112,393 cases. Research Foundation of Sao Paulo State(Fapesp) released a dataset, it was an innovative in collaboration with hospitals(Einstein, Sirio-Libanes), laboratory(Fleury) and Sao Paulo University to foster reseach on this trend topic. The present paper presents an exploratory analysis of the datasets, using a Data Mining Approach, and some inconsistencies are found, i.e. NaN values, null references values for analytes, outliers on results of analytes, encoding issues. The results were cleaned datasets for future studies, but at least a 20\% of data were discarded because of non numerical, null values and numbers out of reference range.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2020
- DOI:
- 10.48550/arXiv.2008.11344
- arXiv:
- arXiv:2008.11344
- Bibcode:
- 2020arXiv200811344C
- Keywords:
-
- Computer Science - Computers and Society