We describe an approach to classifying images found on the WWW for the purpose of information extraction (IE). Among features used for classification are image sizes, colour histograms, and the simi-larity of the classified image's content to images in a training collection. Our content similarity metric is based on the latent semantic index. Re-sults are presented on a collection of 1624 image occurrences found on bicycle shop websites, and the task is to distinguish bicycle images from the rest.