Learning Word-vector Quantization: A Case Study in Morphological Disambiguation


Orhan U., Arslan E.

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, cilt.19, sa.5, 2020 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 19 Sayı: 5
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1145/3397967
  • Dergi Adı: ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Anahtar Kelimeler: Learning word-vector quantization, Turkish morphological disambiguation, classification, Learning vector quantization
  • Çanakkale Onsekiz Mart Üniversitesi Adresli: Hayır

Özet

We introduced a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological ambiguities in Turkish, which is an agglutinative language. First, a new and morphologically annotated corpus, and then its datasets are prepared with a series of processes. According to datasets, LWQ finds optimal word-vectors positions by moving them in the Euclidean space. LWQ does morphological disambiguation in two steps: First, it defines all solution candidates of an ambiguous word using a morphological analyzer; second, it chooses the best candidate according to its total distances to neighbor words that are not ambiguous. To show LWQ's performance, we have conducted many tests on the corpus by considering the consistency of classification. In the experiments, we achieve 98.4% correct classification ratio to choose correct parse output, which is an excellent level for the literature.