Open-World Semi-Supervised Learning for Node Classification

doi:10.48550/arXiv.2403.11483

Open-World Semi-Supervised Learning for Node Classification

Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community. As only seen classes have human labels, they are usually better learned than novel classes, and thus exhibit smaller intra-class variances within the embedding space (named as imbalance of intra-class variances between seen and novel classes). Based on empirical and theoretical analysis, we find the variance imbalance can negatively impact the model performance. Pre-trained feature encoders can alleviate this issue via producing compact representations for novel classes. However, creating general pre-trained encoders for various types of graph data has been proven to be challenging. As such, there is a demand for an effective method that does not rely on pre-trained graph encoders. In this paper, we propose an IMbalance-Aware method named OpenIMA for Open-world semi-supervised node classification, which trains the node classification model from scratch via contrastive learning with bias-reduced pseudo labels. Extensive experiments on seven popular graph benchmarks demonstrate the effectiveness of OpenIMA, and the source code has been available on GitHub.

Publication:

arXiv e-prints

Pub Date:

March 2024

DOI:

10.48550/arXiv.2403.11483

arXiv:

arXiv:2403.11483

Bibcode:

2024arXiv240311483W

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Social and Information Networks

E-Print:

Accepted by ICDE 2024

NASA/ADS

Open-World Semi-Supervised Learning for Node Classification

Abstract