Elsevier, Information Sciences, 24(178), p. 4644-4655
DOI: 10.1016/j.ins.2008.08.003
Full text: Download
The need for measuring the dispersion of nominal categorical attributes appears in several applications, like clustering or data anonymization. For a nominal attribute whose catego- ries can be hierarchically classified, a measure of the variance of a sample drawn from that attribute is proposed which takes the attribute's hierarchy into account. The new measure is the reciprocal of ''consanguinity": the less related the nominal categories in the sample, the higher the measured variance. For non-hierarchical nominal attributes, the proposed measure yields results consistent with previous diversity indicators. Applications of the new nominal variance measure to economic diversity measurement and data anonymiza- tion are also discussed.