End-to-end people detection in crowded scenes

doi:10.48550/arXiv.1506.04878

End-to-end people detection in crowded scenes

Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes.

Publication:

arXiv e-prints

Pub Date:

June 2015

DOI:

10.48550/arXiv.1506.04878

arXiv:

arXiv:1506.04878

Bibcode:

2015arXiv150604878S

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

9 pages, 7 figures. Submitted to NIPS 2015. Supplementary material video: http://www.youtube.com/watch?v=QeWl0h3kQ24

NASA/ADS

End-to-end people detection in crowded scenes

Abstract