Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

doi:10.48550/arXiv.2302.07393

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. While weighted majority vote (WMV) with a single weight vector for each worker achieves the optimal label estimation error in the Dawid-Skene model, we show that different weights for different types are necessary for a multi-type model. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups that cluster tasks by type. Our analysis reveals that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.

Publication:

arXiv e-prints

Pub Date:

February 2023

DOI:

10.48550/arXiv.2302.07393

arXiv:

arXiv:2302.07393

Bibcode:

2023arXiv230207393M

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Statistics - Applications

NASA/ADS

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

Abstract