A nonparametric framework for inferring orders of categorical data from category-real pairs
Abstract
Given a dataset of careers and incomes, how large a difference of incomes between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode A instead of B to travel? In this paper, we propose a framework that is able to infer orders of categories as well as magnitudes of difference of real numbers between each pair of categories using an estimation statistics framework. Our framework not only reports whether an order of categories exists, but it also reports magnitudes of difference of each consecutive pair of categories in the order. In a large dataset, our framework is scalable well compared with existing frameworks. The proposed framework has been applied to two real-world case studies: 1) ordering careers by incomes from 350,000 households living in Khon Kaen province, Thailand, and 2) ordering sectors by closing prices from 1,060 companies in NASDAQ stock market between years 2000 and 2016. The results of careers ordering demonstrate income inequality among different careers. The stock market results illustrate dynamics of sector domination that can change over time. Our approach is able to be applied in any research area that has category-real pairs. Our proposed Dominant-Distribution Network provides a novel approach to gain new insight of analyzing category orders. A software of this framework is available for researchers or practitioners in an R CRAN package: EDOIF.
- Publication:
-
Heliyon
- Pub Date:
- November 2020
- DOI:
- 10.1016/j.heliyon.2020.e05435
- arXiv:
- arXiv:1911.06723
- Bibcode:
- 2020Heliy...605435A
- Keywords:
-
- Computer Science;
- Ordering inference;
- Estimation statistics;
- Bootstrapping;
- Nonparametric method;
- Data Science;
- Income inequality;
- Statistics - Methodology;
- Computer Science - Computers and Society;
- Mathematics - Statistics Theory;
- Physics - Data Analysis;
- Statistics and Probability;
- Statistics - Machine Learning;
- 62G07;
- 06A06;
- G.3;
- I.2.6
- E-Print:
- The R package can be found at https://github.com/DarkEyes/EDOIF