Important citations identification by exploiting generative model into discriminative model

An, Xin; Sun, Xin; Xu, Shuo; Hao, Liyuan; Li, Jinghong

Published in

SAGE Publications, Journal of Information Science, 1(49), p. 107-121, 2021

DOI: 10.1177/0165551521991034

Tools

Export citation

Search in Google Scholar

Important citations identification by exploiting generative model into discriminative model

Journal article published in 2021 by Xin An

, Xin Sun, Shuo Xu

, Liyuan Hao, Jinghong Li

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model–based features improve further the performance for identifying important citations.

Published in

Links

Tools

Important citations identification by exploiting generative model into discriminative model

Abstract