Improved inverse gravity moment term weighting for text classification


Dogan T., Uysal A. K.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.130, ss.45-59, 2019 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 130
  • Basım Tarihi: 2019
  • Doi Numarası: 10.1016/j.eswa.2019.04.015
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Sayfa Sayıları: ss.45-59

Özet

Text classification is one of the popular high dimensional classification problems where providing better feature vector representations explicitly improve classification performances. Thus, assigning appropriate weights to features or terms are crucial for obtaining effective feature vector representations. The methods used for weighting terms in text classification are called term weighting schemes. Although there exist some term weighting schemes for text classification, they are not fully effective and researchers still focus on proposing new term weighting schemes. In this study, two novel term weighting schemes namely SQRT_TF-IGM(imp) and TF-IGM(imp) derived from standard inverse gravity moment formula are proposed to improve weighting behaviors of existing TF-IGM scheme especially for some extreme cases. The performances of proposed schemes are compared with two standard IGM based schemes and five other state-of-the-art term weighting methods on both unbalanced (Reuters-21578) and balanced (20 Mini Newsgroups and 20 Newsgroups) datasets with KNN, SVM, and NN classifiers. Micro-F1 and macro-F1 are used as success measures. The experiments are conducted with various different feature sizes to examine the effects of the feature size on the success of weighting. The experimental results showed that the proposed SQRT_TF-IGM(imp) method generally outperformed all schemes including both standard TF-IGM and SQRT_TF-IGM schemes. However, the proposed TF-IGMimp scheme also showed mostly better performance than standard TF-IGM. To demonstrate validity of the proposed weighting scheme having maximum performance, t-test is also used and it can be stated that the performance gains obtained by the proposed SQRT_TF-IGM(imp) weighting scheme compared to standard SQRT_TF-IGM are statistically significant. (C) 2019 Elsevier Ltd. All rights reserved.