2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
DOI: 10.1109/icmew.2013.6618349
Full text: Download
This paper presents a method for the detection and localization of instances of user-specified objects within a video or a collection of videos. The proposed method is based on the extraction and matching of SURF descriptors in video frames and further incorporates a number of improvements so as to enhance both the detection accuracy and the time efficiency of the process. Specifically, (a) GPU-based processing is introduced for specific parts of the object re-detection pipeline, (b) a new video-structure-based sampling technique is employed for limiting the number of frames that need to be processed and (c) improved robustness to scale variations is achieved by generating and employing additional instances of the object of interest based on the one originally provided by the user. The experimental results show that the algorithm achieves high levels of detection accuracy while the overall needed processing time makes the algorithm suitable for quick instance-based labeling of video and the creation of object-based spatio-temporal fragments.