Published in

Pensoft Publishers, Proceedings of TDWG, (3), 2019

DOI: 10.3897/biss.3.37525

Links

Tools

Export citation

Search in Google Scholar

Discovering Patterns of Biodiversity in Insects Using Deep Machine Learning

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Museum specimens have enormous potential for use in a broad range of biodiversity and evolutionary questions, but their data are typically accessible only to researchers who can physically visit collections facilities. Recent digitization efforts of collections provide new modes of access and collaboration to enrich biodiversity knowledge, and remarkable progress is now being made in assembling a corpus of imaged specimens and their associated labels. The Smithsonian Digitization Program Office recently partnered with the National Museum of Natural History (NMNH), Department of Entomology to mass-digitize their bumblebee (genus Bombus) collection. Digital images were captured from more than 45,000 specimens and labels were transcribed by volunteers through the Smithsonian Transcription Center. More than 10,000 of these specimens are not yet identified to subgenus or species. We present deep learning models (specifically, convolutional neural networks) that can classify specimens to subgenus (NMMH has 15 subgenera) and species (NMNH has 178 species). Both models average greater than 90% accuracy even when trained on a small number of input images (tens of images per class). Beyond taxonomic classification, we explore how we can link our models to traditional morphological characters, biogeographical data, digitized scientific literature, and external image datasets to further our understanding of biodiversity.