Web Image Classification for Information Extraction Web Image Classification for Information Extraction

Labsk, Martin; Vacura, Miroslav; Praks, Pavel

Links

[www.researchgate.net] | PDF

Tools

Export citation

Search in Google Scholar

Web Image Classification for Information Extraction Web Image Classification for Information Extraction

Journal article published in 2005 by Martin Labsk, Miroslav Vacura, Pavel Praks

This paper is available in a repository.

Full text: Download

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

We describe an approach to classifying images found on the WWW for the purpose of information extraction (IE). Among features used for classification are image sizes, colour histograms, and the simi-larity of the classified image's content to images in a training collection. Our content similarity metric is based on the latent semantic index. Re-sults are presented on a collection of 1624 image occurrences found on bicycle shop websites, and the task is to distinguish bicycle images from the rest.