Using Graphs in Construction of a Lemmatization Model for Turkish


Creative Commons License

Arslan E., Orhan U.

2nd International Mediterranean Science and Engineering Congress (IMSEC 2017), Adana, Türkiye, 25 - 27 Ekim 2017, sa.523, ss.1092-1097

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Adana
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1092-1097
  • Çanakkale Onsekiz Mart Üniversitesi Adresli: Hayır

Özet

In this paper, a lemmatization framework which uses the capability of a graph database is presented. As introduced in the previous research, using Finite State Machines (FSM) in lemmatization of Turkish words is applicable when an affix-stripping method is preferred. These studies present results for limited datasets and can be modelled for larger and actual data environments. To ensure a living up-to-date system we propose a dynamic lemmatization model which feeds up a static graph database model with new words by using a mophological function to validate the graph relations. This function is developed on Finite State Machines (FSM) which encodes Turkish grammar and affixes. Good results on the framework can lead to discovery of out-of-vocabulary (OOV) words and disambiguation of the ambiguous ones