Data-Centric AI Requires Rethinking Data Notion
Abstract
The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, data is viewed as a mathematical structure that we act upon via morphisms to preserve this structure. As for cochain notion, data can be viewed as a function defined in a discrete domain of interest and acted upon via operators. While these notions are almost orthogonal, they provide a unifying definition to view data, ultimately impacting the way machine learning packages are developed, implemented, and utilized by practitioners.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2021
- DOI:
- 10.48550/arXiv.2110.02491
- arXiv:
- arXiv:2110.02491
- Bibcode:
- 2021arXiv211002491H
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Neural and Evolutionary Computing;
- Mathematics - Category Theory;
- Statistics - Machine Learning
- E-Print:
- Conference: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) At: NEURIPS DATA-CENTRIC AI WORKSHOP