Facing Academic Integrity Threats (FAITH) Conference 2024, Çanakkale, Türkiye, 5 - 09 Ağustos 2024, cilt.1, sa.1, ss.36, (Özet Bildiri)
This paper aims to explore the performance of “Synthetic Humanistic Textuality” in AI-generated texts within “Non-Latin Alphabet Languages (NoLaL)”. The research was conducted by the NoLaL-AI team, consisting of experts in Chinese, Hindi, Japanese, Korean, and Russian languages, along with specialists in Generative AI (GenAI) for foreign language teaching. We employed a comprehensive methodology developed by our team. As a theoretical framework, we established the term “Synthetic Humanistic Textuality” and aligned our approach accordingly. To assess AI-generated NoLaL texts, we developed a rigorous rubric focusing on four dimensions: Linguistic Naturalness, Semantic and Stylistic Coherence, Content/Expression Maturity, and Sociocultural Foundation. This rubric, refined through expert feedback, includes 12 sub-criteria evaluated on a three-level performance scale. Prompts were designed to generate texts in three formats—composition, email, and creative story writing—each designed to assess AI's Synthetic Humanistic Textuality capabilities. The prompts were created considering five key elements: persona, objective, context, format, and tone, and were tested through a negotiation-based interrater agreement process to ensure consistent scoring across different raters. Preliminary results indicate that although there are variations in performance based on language, the highest overall performance was observed in the dimension of Linguistic Naturalness (2.25 out of 3). Conversely, the lowest performance was found in Semantic Coherence (1.88 out of 3), with Content/Expression Maturity (1.95 out of 3) and Sociocultural Foundation/Sensitivity (1.93 out of 3) also showing lower performance levels. When analyzed by language, Japanese exhibited the highest performance (2.38 out of 3), while Korean had the lowest (1.97 out of 3). Chinese (2.17 out of 3), Hindi (2.07 out of 3), and Russian (2.01 out of 3) fell in the middle range.