Links

Tools

Export citation

Search in Google Scholar

Deep Reinforcement Learning for Visual Object Tracking in Videos

Published in 2017 by Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

Convolutional neural network (CNN) models have achieved tremendous success in many visual detection and recognition tasks. Unfortunately, visual tracking, a fundamental computer vision problem, is not handled well using the existing CNN models, because most object trackers implemented with CNN do not effectively leverage temporal and contextual information among consecutive frames. Recurrent neural network (RNN) models, on the other hand, are often used to process text and voice data due to their ability to learn intrinsic representations of sequential and temporal data. Here, we propose a novel neural network tracking model that is capable of integrating information over time and tracking a selected target in video. It comprises three components: a CNN extracting best tracking features in each video frame, an RNN constructing video memory state, and a reinforcement learning (RL) agent making target location decisions. The tracking problem is formulated as a decision-making process, and our model can be trained with RL algorithms to learn good tracking policies that pay attention to continuous, inter-frame correlation and maximize tracking performance in the long run. We compare our model with an existing neural-network based tracking method and show that the proposed tracking approach works well in various scenarios by performing rigorous validation experiments on artificial video sequences with ground truth. To the best of our knowledge, our tracker is the first neural-network tracker that combines convolutional and recurrent networks with RL algorithms.