Characterizing Data Scientists in the Real World
Abstract
Data collection is pervasively bound to our digital lifestyle. A recent study by the IDC reports that the growth of the data created and replicated in 2020 was even higher than in the previous years due to pandemic-related confinements to an astonishing global amount of 64.2 zettabytes of data. While not all the produced data is meant to be analyzed, there are numerous companies whose services/products rely heavily on data analysis. That is to say that mining the produced data has already revealed great value for businesses in different sectors. But to be able to fully realize this value, companies need to be able to hire professionals that are capable of gleaning insights and extracting value from the available data. We hypothesize that people nowadays conducting data-science-related tasks in practice may not have adequate training or formation. So in order to be able to fully support them in being productive in their duties, e.g. by building appropriate tools that increase their productivity, we first need to characterize the current generation of data scientists. To contribute towards this characterization, we conducted a public survey to fully understand who is doing data science, how they work, what are the skills they hold and lack, and which tools they use and need.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2024
- DOI:
- 10.48550/arXiv.2411.12225
- arXiv:
- arXiv:2411.12225
- Bibcode:
- 2024arXiv241112225P
- Keywords:
-
- Computer Science - Human-Computer Interaction