Prognostic Prediction of Avulsed Permanent Teeth Using Conversational AI Models Versus Expert Dentists: Influence of Prompt Structure and Temporal Stability

Buldur, MEHMET; Ayan, GİZEM; Misilli, TUĞBA

doi:10.1111/edt.70071

Prognostic Prediction of Avulsed Permanent Teeth Using Conversational AI Models Versus Expert Dentists: Influence of Prompt Structure and Temporal Stability

Buldur M., Ayan G., Misilli T.

DENTAL TRAUMATOLOGY, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1111/edt.70071
Dergi Adı: DENTAL TRAUMATOLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, MEDLINE
Çanakkale Onsekiz Mart Üniversitesi Adresli: Evet

Özet

Aim Prognosis prediction after tooth avulsion is challenging due to multiple interacting clinical factors. The agreement of conversational large language models with expert dentists, their sensitivity to prompt format, and the stability of their outputs over time remain unclear. This study compared four conversational large language models with expert assessments using standardized simulated avulsion scenarios and evaluated prompt format effects and short-term stability.Material and Methods A simulation-based observational study was conducted using 120 standardized synthetic avulsion scenarios created by three experienced dentists. Each scenario was converted into four standardized prompt formats while preserving identical clinical content. Four conversational large language models (ChatGPT-4o, Gemini, Claude, and DeepSeek) evaluated each case twice at a 48-h interval, providing numeric scores on a 0-12 scale and ordinal prognosis categories. Expert reference values were obtained from blinded independent dentist assessments with subsequent consensus. Agreement, prompt-related variability, and short-term stability were analyzed using nonparametric and agreement-based methods.Results Meaningful agreement with expert prognosis assessments was observed across models for both numeric scores and ordinal categories. Gemini and ChatGPT-4o showed the most balanced performance, whereas Claude preserved relative risk ordering with greater deviation in absolute severity, and DeepSeek demonstrated lower categorical concordance. Prompt format significantly influenced outputs, with the clinical-practical format showing the closest alignment and the narrative format the weakest. Short-term stability was generally maintained, although small but statistically detectable shifts occurred in some model-format combinations.Conclusions Conversational large language models can generate avulsion prognosis estimates that meaningfully align with expert dentists under standardized conditions. However, performance is model-dependent and strongly influenced by information structure. These systems should be used as supportive tools rather than stand-alone decision makers, and further studies using real-world data are needed to confirm clinical utility and stability.Clinical Relevance Under standardized conditions, conversational AI models-particularly Gemini and ChatGPT-may support clinicians as secondary decision aids in structured prognosis estimation for avulsed permanent teeth.