Does Sequential Use of Multiple Chatbots Influence Emergency Guidance for Dental Avulsion?

Buldur, MEHMET; Sezer, BERKANT

doi:10.1111/edt.70072

Does Sequential Use of Multiple Chatbots Influence Emergency Guidance for Dental Avulsion?

Buldur M., Sezer B.

DENTAL TRAUMATOLOGY, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1111/edt.70072
Dergi Adı: DENTAL TRAUMATOLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, MEDLINE
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Çanakkale Onsekiz Mart Üniversitesi Adresli: Evet

Özet

Background/Aim: Artificial intelligence-based chatbot systems are increasingly used for dental emergency guidance, yet the clinical value of sequential consultation remains unclear. This study investigated whether response order influences the quality of chatbot-generated answers to permanent tooth avulsion questions. Material and Methods: Eight frequently asked questions covering key decision points in avulsion management were developed based on International Association of Dental Traumatology guidelines. Three publicly accessible chatbots-ChatGPT (GPT-5), Gemini (Gemini 1.5 Pro), and DeepSeek (DeepSeek-R1)-were queried using a standardized sequential prompt design. For each question, all chatbot order permutations were applied so that each system generated first-, second-, and third-position responses. Second- and third-position prompts required evaluation of prior responses and correction of missing or incorrect information, thereby creating a controlled "maximum correction opportunity" condition. Responses were generated in independent sessions and evaluated by three expert dentists blinded to model identity using predefined criteria: accuracy, safety, completeness, correction ability, clarity, and usefulness. Composite scores were calculated within seven conceptual evaluation frameworks, and response-order effects were tested using the Friedman test with Kendall's W. Results: Across all seven conceptual evaluation frameworks, composite quality scores did not differ significantly according to response order (all p > 0.05), with Kendall's W indicating negligible effect sizes. No consistent improvement was observed in accuracy, safety, completeness, or correction ability in second- or third-position responses. Model-specific analyses confirmed the absence of response-order effects across all chatbot platforms. Conclusions: Under structured evaluative conditions permitting explicit revision, sequential consultation of multiple chatbot systems did not improve the quality, safety, or guideline alignment of information provided for dental avulsion management. These findings suggest that response-order effects alone may have limited corrective impact in this context and should be interpreted within the boundaries of controlled experimental modeling.