Learning Word-vector Quantization: A Case Study in Morphological Disambiguation


Orhan U., Arslan E.

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, vol.19, no.5, 2020 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 19 Issue: 5
  • Publication Date: 2020
  • Doi Number: 10.1145/3397967
  • Journal Name: ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Keywords: Learning word-vector quantization, Turkish morphological disambiguation, classification, Learning vector quantization
  • Çanakkale Onsekiz Mart University Affiliated: No

Abstract

We introduced a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological ambiguities in Turkish, which is an agglutinative language. First, a new and morphologically annotated corpus, and then its datasets are prepared with a series of processes. According to datasets, LWQ finds optimal word-vectors positions by moving them in the Euclidean space. LWQ does morphological disambiguation in two steps: First, it defines all solution candidates of an ambiguous word using a morphological analyzer; second, it chooses the best candidate according to its total distances to neighbor words that are not ambiguous. To show LWQ's performance, we have conducted many tests on the corpus by considering the consistency of classification. In the experiments, we achieve 98.4% correct classification ratio to choose correct parse output, which is an excellent level for the literature.