The 2013 International Joint Conference on Neural Networks (IJCNN)
DOI: 10.1109/ijcnn.2013.6707041
Full text: Unavailable
In this paper we propose a classifier for generalized sequences that is conceived in the granular computing framework. The classification system processes the input sequences of objects by means of a suited interplay among dissimilarity and clustering based techniques. The core data mining engine retrieves information granules that are used to represent the input sequences as feature vectors. Such a representation allows to deal with the original sequence classification problem through standard pattern recognition tools. We have evaluated the generalization capability of the system in an interesting case study concerning the protein folding problem. In the considered dataset, the entire E. Coli proteome was screened as for the prediction of protein relative solubility on a pure amino acids sequence basis. We report the analysis of the dataset considering different settings, showing interesting test set classification accuracy results. The developed system consents also to extract knowledge from the considered training set, by allowing the analysis of the retrieved information granules. © 2013 IEEE.