Identification of OOV Words in Turkish Texts


Arslan E., Orhan U.

GAZİOSMANPAŞA BİLİMSEL ARAŞTIRMA DERGİSİ (GBAD), cilt.8, sa.2, ss.35-48, 2019 (Hakemli Dergi)

Özet

 In this study, we present a semantic graph network model which is capable of detecting out-ofvocabulary (OOV) words in Turkish texts. In natural language processing (NLP) field, morphological analyzers can encounter unknown words (UW) during word processing. This mostly occurs when these kind of tools depend on a dictionary to find the probable lemmas in order to further process parsing. Sometimes, an analyzer is unable to find any candidates because of the non-existence of the lemma candidates in the dictionary. This results in degraded parsing output. The proposed model for OOV detection is able to define OOV words which are suitable for dictionaries. Also co-occurrence relations of the lemmas in texts are modelled as a semantic sub-graph and it is used to discover collocations to propose as new lemma candidates