Fisher Vectors Derived from Hybrid GaussianLaplacian Mixture Models for Image Annotation
Abstract
In the traditional object recognition pipeline, descriptors are densely sampled over an image, pooled into a high dimensional nonlinear representation and then passed to a classifier. In recent years, Fisher Vectors have proven empirically to be the leading representation for a large variety of applications. The Fisher Vector is typically taken as the gradients of the loglikelihood of descriptors, with respect to the parameters of a Gaussian Mixture Model (GMM). Motivated by the assumption that different distributions should be applied for different datasets, we present two other Mixture Models and derive their ExpectationMaximization and Fisher Vector expressions. The first is a Laplacian Mixture Model (LMM), which is based on the Laplacian distribution. The second Mixture Model presented is a Hybrid GaussianLaplacian Mixture Model (HGLMM) which is based on a weighted geometric mean of the Gaussian and Laplacian distribution. An interesting property of the ExpectationMaximization algorithm for the latter is that in the maximization step, each dimension in each component is chosen to be either a Gaussian or a Laplacian. Finally, by using the new Fisher Vectors derived from HGLMMs, we achieve stateoftheart results for both the image annotation and the image search by a sentence tasks.
 Publication:

arXiv eprints
 Pub Date:
 November 2014
 arXiv:
 arXiv:1411.7399
 Bibcode:
 2014arXiv1411.7399K
 Keywords:

 Computer Science  Computer Vision and Pattern Recognition
 EPrint:
 new version includes text synthesis by an RNN and experiments with the COCO benchmark