Web Image Classification for Information Extraction Web Image Classification for Information Extraction

Journal article published in 2005 by Martin Labsk, Miroslav Vacura, Pavel Praks

Full text: Download

Publisher: Unknown publisher

Preprint: policy unknown. Upload

Postprint: policy unknown. Upload

Published version: policy unknown. Upload

Contact authors Contact

We describe an approach to classifying images found on the WWW for the purpose of information extraction (IE). Among features used for classification are image sizes, colour histograms, and the simi-larity of the classified image's content to images in a training collection. Our content similarity metric is based on the latent semantic index. Re-sults are presented on a collection of 1624 image occurrences found on bicycle shop websites, and the task is to distinguish bicycle images from the rest.