EDITORIAL: Topological data analysis Topological data analysis
Abstract
Inverse problems can be defined as the area of mathematics that attempts to reconstruct a physical or mathematical object from derived data. Frequently, this means the evaluation of parameters or other numerical quantities (such as eigenvalues) that characterize or provide information about the system.
There are, however, other aspects of a system that are important, but are not as readily summarized by numerical quantities. If one considers observations of diabetic patients (using metabolic quantities), one will find that the data breaks up into components, or pieces, corresponding to distinct forms of the disease. The decomposition of data sets into disjoint pieces, or clustering, is an aspect of the study of the shape of the data, albeit one that has been extensively studied. A more complex notion of shape appears in observations of a predator-prey system governed by a Lotka-Volterra equation. One would find that exact observations, consisting of (prey population, predator population) pairs, appear to lie along a simple closed curve in the plane. The fact that the data lies along such a closed curve is an important piece of information, since it suggests that the system displays recurrent behavior. If one did not know, a priori, that the system is governed by a Lotka-Volterra equation, then it would not be immediately obvious that the system is undergoing recurrent motion, and this deduction would constitute a significant insight. In this case, it is again the shape of the data, namely the fact that it lies on a simple closed curve, which is the key insight. Shape is a somewhat nebulous concept, which at first blush may be too intuitive to make precise mathematically, and describe quantitatively. Within pure mathematics, the disciplines of topology and differential geometry are designed exactly to address this problem. They provide explicit signatures which, in precise senses, quantify and describe the shape of a geometric object. In addition, they provide methods for discretizing and compressing the information present in a geometric object so as to provide a useful, small representation of the object. The articles in this special issue are concerned with the applications of topology to the analysis of data sets. The adaptation of topological techniques from pure mathematics to the study of data from real systems is a project which has been undertaken during the past two decades, and the present volume contains various contributions to that project. At the current state of development, homology and persistence are two of the most popular topological techniques used in this context. Homology goes back to the beginnings of topology in Poincaré's influential papers. It is the idea that the connectivity of a space is determined by its cycles of different dimensions, and that these cycles organize themselves into abelian groups, called homology groups. Better known than these groups are their ranks, the Betti numbers of the space, which are non-negative integers that count the number of independent cycles in each dimension. To give an example, the zeroth Betti number counts the components, and the first counts the loops. A crucial feature of homology groups is that, given a reasonably explicit description of a space, their computation is an exercise in linear algebra. Even better known than the Betti numbers is the Euler characteristic, which we know from Poincaré's work, is equal to the alternating sum of the Betti numbers, which can be computed without computing the homology groups themselves. To give evidence that these numbers have relevant practical applications, we mention that integrating the Euler characteristic over a domain with sensor information can be used to count objects in the domain. This alone would not explain the popularity of homology groups, which we see rooted in the fact that they hit a sweet-spot that offers relatively strong discriminative power, and a clear intuitive meaning, all at a surprisingly low computational cost. Even these desirable qualities would not be sufficient if it were not possible to overcome a serious shortcoming, namely the high sensitivity of homology to minor mistakes in the data collection. Because of the finite nature of most data sets, the notion of shape within data sets is inevitably stochastic. To some extent this is because of the uncertainty of what a shape in nature is, but more importantly, the available data can only be used to give an estimate for the probability of a given shape. This has led to the study of persistent homology, in which the invariants are in the form of 'persistence diagrams' or 'barcodes'. These invariants quantify the stability of geometric features with respect to perturbations that, in turn, provide a basis for discriminating between artifacts caused by noise or undersampling and real phenomena. Several papers in this volume deal with questions about these diagrams, and some deal with probabilistic issues related to the occurrence of these diagrams. Our hope is that the papers in this volume will provide exposure of these techniques to both a wider audience of mathematicians and also potential users of the techniques.- Publication:
-
Inverse Problems
- Pub Date:
- December 2011
- DOI:
- Bibcode:
- 2011InvPr..27a0101E