Identifiability of a statistical model with two latent vectors: Importance of the dimensionality relation and application to graph embedding
Abstract
Identifiability of statistical models is a key notion in unsupervised representation learning. Recent work of nonlinear independent component analysis (ICA) employs auxiliary data and has established identifiable conditions. This paper proposes a statistical model of two latent vectors with single auxiliary data generalizing nonlinear ICA, and establishes various identifiability conditions. Unlike previous work, the two latent vectors in the proposed model can have arbitrary dimensions, and this property enables us to reveal an insightful dimensionality relation among two latent vectors and auxiliary data in identifiability conditions. Furthermore, surprisingly, we prove that the indeterminacies of the proposed model has the same as \emph{linear} ICA under certain conditions: The elements in the latent vector can be recovered up to their permutation and scales. Next, we apply the identifiability theory to a statistical model for graph data. As a result, one of the identifiability conditions includes an appealing implication: Identifiability of the statistical model could depend on the maximum value of link weights in graph data. Then, we propose a practical method for identifiable graph embedding. Finally, we numerically demonstrate that the proposed method well-recovers the latent vectors and model identifiability clearly depends on the maximum value of link weights, which supports the implication of our theoretical results
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- arXiv:
- arXiv:2405.19760
- Bibcode:
- 2024arXiv240519760S
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning