We propose an efficient transfer Bayesian optimization method, which finds the maximum of an expensive-to-evaluate black-box function by using data on related optimization tasks. Our method uses auxiliary information that represents the task characteristics to effectively transfer knowledge for estimating a distribution over target functions. In particular, we use a Gaussian process, in which the mean and covariance functions are modeled with neural networks that simultaneously take both the auxiliary information and feature vectors as input. With a neural network mean function, we can estimate the target function even without evaluations. By using the neural network covariance function, we can extract nonlinear correlation among feature vectors that are shared across related tasks. Our Gaussian process-based formulation not only enables an analytic calculation of the posterior distribution but also swiftly adapts the target function to observations. Our method is also advantageous because the computational costs scale linearly with the number of source tasks. Through experiments using a synthetic dataset and datasets for finding the optimal pedestrian traffic regulations and optimal machine learning algorithms, we demonstrate that our method identifies the optimal points with fewer target function evaluations than existing methods.