Published in

MDPI, Remote Sensing, 14(15), p. 3626, 2023

DOI: 10.3390/rs15143626

Links

Tools

Export citation

Search in Google Scholar

A Hybrid Approach Based on GAN and CNN-LSTM for Aerial Activity Recognition

Journal article published in 2023 by Abir Bousmina, Mouna Selmi, Mohamed Amine Ben Rhaiem, Imed Riadh Farah ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Unmanned aerial vehicles (UAVs), known as drones, have played a significant role in recent years in creating resilient smart cities. UAVs can be used for a wide range of applications, including emergency response, civil protection, search and rescue, and surveillance, thanks to their high mobility and reasonable price. Automatic recognition of human activity in aerial videos captured by drones is critical for various tasks for these applications. However, this is difficult due to many factors specific to aerial views, including camera motion, vibration, low resolution, background clutter, lighting conditions, and variations in view. Although deep learning approaches have demonstrated their effectiveness in a variety of challenging vision tasks, they require either a large number of labelled aerial videos for training or a dataset with balanced classes, both of which can be difficult to obtain. To address these challenges, a hybrid data augmentation method is proposed which combines data transformation with the Wasserstein Generative Adversarial Network (GAN)-based feature augmentation method. In particular, we apply the basic transformation methods to increase the amount of video in the database. A Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM) model is used to learn the spatio-temporal dynamics of actions, then a GAN-based technique is applied to generate synthetic CNN-LSTM features conditioned on action classes which provide a high discriminative spatio-temporal features. We tested our model on the YouTube aerial database, demonstrating encouraging results that surpass those of previous state-of-the-art works, including an accuracy rate of 97.83%.