Indian Journal of Orthopaedics, 2025 (SCI-Expanded, Scopus)
Purpose: This study evaluated how closely large language models (LLMs), specifically ChatGPT (OpenAI) and NotebookLM (Google), adhere to orthopedic guidelines. The objective was to determine whether AI-generated reasoning aligns with the 2021–2022 American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines for knee osteoarthritis (OA). Methods: A mixed-methods design combined quantitative concordance scoring with qualitative content analysis. Thirty-three decision points covering non-arthroplasty and surgical management were derived from AAOS guidelines. Structured Population–Intervention–Comparison–Outcome (PICO) prompts were presented to each model. Two orthopedic surgeons independently rated all outputs using a four-domain rubric assessing accuracy, evidence reasoning, additional information, and knowledge integration (0–4 scale). Concordance was classified as full (4), partial (3), or discordant (≤ 2), with disagreements resolved through consensus. Inter-rater reliability was almost perfect (weighted κ = 0.87). Results: ChatGPT achieved a mean composite score of 3.67 ± 0.92, and NotebookLM 3.55 ± 0.87, with no significant difference between models (p = 0.18). Full concordance with AAOS recommendations occurred in 84.8% of ChatGPT responses and 75.8% of NotebookLM responses. Both models performed consistently in high-evidence domains such as NSAID therapy, tranexamic acid use, and weight-loss counseling. Variability increased in limited-evidence or technology-driven areas. Partial concordance reflected the omission of evidence qualifiers, while discordant responses involved overstated or speculative interpretations. Conclusion: Both LLMs demonstrated strong alignment with evidence-based orthopedic reasoning. ChatGPT showed slightly higher fidelity to recommendation strength, whereas NotebookLM provided broader contextual interpretation. Structured, guideline-oriented prompting may enhance AI reasoning consistency and support the potential role of LLMs as complementary tools for evidence translation and orthopedic education.