Orthodontic Biomechanical Reasoning with Multimodal Language Models: Performance and Clinical Utility


Arisan A., GENÇ C., DURAN G. S.

BIOENGINEERING-BASEL, cilt.12, sa.11, 2025 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 12 Sayı: 11
  • Basım Tarihi: 2025
  • Doi Numarası: 10.3390/bioengineering12111165
  • Dergi Adı: BIOENGINEERING-BASEL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, INSPEC, Directory of Open Access Journals
  • Çanakkale Onsekiz Mart Üniversitesi Adresli: Evet

Özet

Background: Multimodal large language models (LLMs) are increasingly being explored as clinical support tools, yet their capacity for orthodontic biomechanical reasoning has not been systematically evaluated. This retrospective study assessed their ability to analyze treatment mechanics and explored their potential role in supporting orthodontic decision-making. Methods: Five publicly available models (GPT-o3, Claude 3.7 Sonnet, Gemini 2.5 Pro, GPT-4.0, and Grok) analyzed 56 standardized intraoral photographs illustrating a diverse range of active orthodontic force systems commonly encountered in clinical practice. Three experienced orthodontists independently scored the outputs across four domains-observation, interpretation, biomechanics, and confidence-using a 5-point scale. Inter-rater agreement and consistency were assessed, and statistical comparisons were made between models. Results: GPT-o3 achieved the highest composite score (3.34/5.00; 66.8%), significantly outperforming all other models. The performance ranking was followed by Claude (57.8%), Gemini (52.6%), GPT-4.0 (48.8%), and Grok (38.8%). Inter-rater reliability among the expert evaluators was excellent, with ICC values ranging from 0.786 (Confidence Evaluation) to 0.802 (Observation). Model self-reported confidence showed poor calibration against expert-rated output quality. Conclusions: Multimodal LLMs show emerging potential for assisting orthodontic biomechanical assessment. With expert-guided validation, these models may contribute meaningfully to clinical decision support across diverse biomechanical scenarios encountered in routine orthodontic care.