Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification

Xu, Pengyu; Xiao, Lin; Liu, Bing; Lu, Sijin; Jing, Liping; Yu, Jian

Published in

Proceedings of the AAAI Conference on Artificial Intelligence, 9(37), p. 10602-10610, 2023

DOI: 10.1609/aaai.v37i9.26259

Tools

Export citation

Search in Google Scholar

Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification

Journal article published in 2023 by Pengyu Xu, Lin Xiao, Bing Liu, Sijin Lu, Liping Jing, Jian Yu

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving forbidden

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Multi-label text classification (MLTC) involves tagging a document with its most relevant subset of labels from a label set. In real applications, labels usually follow a long-tailed distribution, where most labels (called as tail-label) only contain a small number of documents and limit the performance of MLTC. To facilitate this low-resource problem, researchers introduced a simple but effective strategy, data augmentation (DA). However, most existing DA approaches struggle in multi-label settings. The main reason is that the augmented documents for one label may inevitably influence the other co-occurring labels and further exaggerate the long-tailed problem. To mitigate this issue, we propose a new pair-level augmentation framework for MLTC, called Label-Specific Feature Augmentation (LSFA), which merely augments positive feature-label pairs for the tail-labels. LSFA contains two main parts. The first is for label-specific document representation learning in the high-level latent space, the second is for augmenting tail-label features in latent space by transferring the documents second-order statistics (intra-class semantic variations) from head labels to tail labels. At last, we design a new loss function for adjusting classifiers based on augmented datasets. The whole learning procedure can be effectively trained. Comprehensive experiments on benchmark datasets have shown that the proposed LSFA outperforms the state-of-the-art counterparts.

Published in

Links

Tools

Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification

Abstract