Text classification using genetic algorithm oriented latent semantic features


Uysal A. K., Gunal S.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.41, sa.13, ss.5938-5947, 2014 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 41 Sayı: 13
  • Basım Tarihi: 2014
  • Doi Numarası: 10.1016/j.eswa.2014.03.041
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.5938-5947
  • Çanakkale Onsekiz Mart Üniversitesi Adresli: Hayır

Özet

In this paper, genetic algorithm oriented latent semantic features (GALSF) are proposed to obtain better representation of documents in text classification. The proposed approach consists of feature selection and feature transformation stages. The first stage is carried out using the state-of-the-art filter-based methods. The second stage employs latent semantic indexing (LSI) empowered by genetic algorithm such that a better projection is attained using appropriate singular vectors, which are not limited to the ones corresponding to the largest singular values, unlike standard LSI approach. In this way, the singular vectors with small singular values may also be used for projection whereas the vectors with large singular values may be eliminated as well to obtain better discrimination. Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions. (C) 2014 Elsevier Ltd. All rights reserved.