Springer Verlag, Lecture Notes in Computer Science, p. 273-286
DOI: 10.1007/978-3-642-33709-3_20
Full text: Unavailable
The recent availability of large scale training sets in conjunction with accurate classifiers (e.g., SVMs) makes it possible to build large sets of “simple” object detectors and to develop new classification approaches in which dictionaries of visual features are substituted by dictionaries of object detectors. The responses of this collection of detectors can then be used as a high-level image representation. In this work, we propose to go a step further in this direction by modeling spatial relations among different detector responses. We use Random Forests in order to discriminatively select spatial relations which represent frequent co-occurrences of detector responses. We demonstrate our idea in the specific people detection framework, which is a challenging classification task due to the variability of the human body articulations and appearance, and we use the recently proposed poselets as our basic object dictionary. The use of poselets is not the only possible, actually the proposed method can be applied more in general since few assumptions are made on the basic object detector. The results obtained show sharp improvements with respect to both the original poselet-based people detection method and to other state-of-the-art approaches on two difficult benchmark datasets.