Learning Word-vector Quantization: A Case Study in Morphological Disambiguation

Orhan, Umut; Arslan, ENİS

doi:10.1145/3397967

Learning Word-vector Quantization: A Case Study in Morphological Disambiguation

Atıf İçin Kopyala

Orhan U., Arslan E.

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, cilt.19, sa.5, 2020 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 19 Sayı: 5
Basım Tarihi: 2020
Doi Numarası: 10.1145/3397967
Dergi Adı: ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Anahtar Kelimeler: Learning word-vector quantization, Turkish morphological disambiguation, classification, Learning vector quantization
Çanakkale Onsekiz Mart Üniversitesi Adresli: Hayır

Özet

We introduced a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological ambiguities in Turkish, which is an agglutinative language. First, a new and morphologically annotated corpus, and then its datasets are prepared with a series of processes. According to datasets, LWQ finds optimal word-vectors positions by moving them in the Euclidean space. LWQ does morphological disambiguation in two steps: First, it defines all solution candidates of an ambiguous word using a morphological analyzer; second, it chooses the best candidate according to its total distances to neighbor words that are not ambiguous. To show LWQ's performance, we have conducted many tests on the corpus by considering the consistency of classification. In the experiments, we achieve 98.4% correct classification ratio to choose correct parse output, which is an excellent level for the literature.