Content-based Search of Large Image Archives at PDS Imaging Node

Content-based Search of Large Image Archives at PDS Imaging Node

The Planetary Data System (PDS) maintains archives of data collected by NASA missions that explore our solar system. The PDS Cartography and Imaging Sciences Node (Imaging Node) provides access to millions of images of planets, moons, and other bodies. Given the large and continually growing volume of data, there is a need for tools that enable users to quickly search for images of interest. Each image archived at the PDS Imaging Node is described by a rich set of searchable metadata properties, such as the time it was collected and the instrument used. However, users often wish to search on the content of the image to find those images most relevant to their scientific investigation or individual curiosity.

To enable the content-based search of the large image archives, we utilized machine learning techniques to create convolution neural network (CNN) classification models. The initial CNN classification results were deployed at the PDS Image Atlas (https://pds-imaging.jpl.nasa.gov/search) in 2017. All of the CNN classification models were trained using the transfer learning approach, in which we adapted a CNN model pretrained on Earth images to classify planetary images. Over the past several years, we employed the following three techniques to improve the efficiency of collecting labeled data sets, the accuracy of the models, and the interpretability of the classification results:

First, we used the marginal-probability based active learning (MP-AL) algorithm for the image labeling process. The MP-AL algorithm selects a batch of images so as to match the distribution of remaining unlabeled images while also minimizing within-batch similarity and similarity to already-labeled images.

Second, we used the classifier chain and ensemble approaches to improve the accuracy of the classification results. The classifier chain approach enables the explicit modeling of the dependencies between classes, and the ensemble approach allows us to take advantage of the outputs of individual classification models.

Third, we incorporated the prototypical part network (ProtoPNet) architecture to improve the interpretability of the classification results. The ProtoPNet architecture dissects an image into smaller prototypical parts, and then combines evidence from the prototypical parts to make a final classification.

Publication:: AGU Fall Meeting Abstracts
Pub Date:: December 2022
Bibcode:: 2022AGUFMIN21C..01L

NASA/ADS

Content-based Search of Large Image Archives at PDS Imaging Node

Abstract