Proceedings of the ACM 2nd international workshop on Video surveillance & sensor networks - VSSN '04
Full text: Download
A video surveillance sequence generally contains a lot of scattered information regarding several objects in cluttered scenes. Especially in case of use of digital hand-held cam-eras, the overall quality is very low due to the unstable mo-tion and the low resolution, even if multiple shots of the desired target are available. To overcome these limitations, we propose a novel Bayes-ian framework based on image super-resolution, that inte-grates all the informative bits of a target and condenses the redundancy. We call this process distillation. In the traditional formulation of the image super-resolu-tion problem, the observed target is (1) always the same, (2) acquired using a camera making small movements, and (3) the number of available images is sufficient for recovering high-frequency information. These hypotheses obviously do not hold in the concrete situations described above. In this paper, we extend and generalize the image super-resolution task, embedding it in a structured framework that accurately distills the necessary information. In short, our approach is composed by two phases. First, a transforma-tion-invariant video clustering coarsely groups and registers the frames, also defining a similarity concept among them. Second, a novel Bayesian super-resolution method uses this concept in order to combine selectively all the pixels of sim-ilar frames, whose result consists in a highly informative super-resolved image of the desired target. Our approach is first tested on synthetic data, obtain-ing encouraging comparative results with respect to known super-resolution techniques and a definite robustness against noise. Second, real data coming from videos taken by a hand-held camera are considered, trying to solve the ma-jor details of a person in motion, a typical setting of video surveillance applications.