Graph-based lemmatization of Turkish words by using morphological similarity


Arslan E., Orhan U.

2016 International Symposium on INnovations in Intelligent SysTems and Applications, INISTA 2016, Sinaia, Romanya, 2 - 05 Ağustos 2016 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/inista.2016.7571835
  • Basıldığı Şehir: Sinaia
  • Basıldığı Ülke: Romanya
  • Anahtar Kelimeler: graph db, Lemmatization, morphological, similarity, zemberek
  • Çanakkale Onsekiz Mart Üniversitesi Adresli: Hayır

Özet

© 2016 IEEE.Lemmatization of the words is an important preprocess for Natural Language Processing (NLP) studies. Especially in language applications (such as part of speech tagging, spell-checking, and document clustering), selection of the right lemma with morphological features can provide better results. In this study, we present a new hybrid approach for Turkish inflected words by using morphological similarity based graph models which is recently getting popular in lemmatization. For this aim, a novel similarity function for Turkish is developed to connect the similar word forms. The proposed model is trained and tested by a double-checked Turkish lemmatization dataset. Then, empirical results are compared with ones of Zemberek which is the most used Turkish lemmatization tool.