Tundra Space

Tundra Space

Clinical Research Directory

Browse clinical research sites, groups, and studies.

Back to Studies
RECRUITING
NCT07632859

Diagnostic Accuracy of GPT-4o and Claude 4.6 Sonnet in Turkish ED Anamnesis Notes

Sponsor: Marmara University Pendik Training and Research Hospital

View on ClinicalTrials.gov

Summary

This retrospective diagnostic accuracy study evaluates the ability of two large language models (LLMs) - GPT-4o (gpt-4o-2024-11-20; OpenAI) and Claude 4.6 Sonnet (claude-sonnet-4-6; Anthropic) - to generate correct diagnoses from anonymized Turkish-language emergency department (ED) anamnesis notes, and compares their performance with the diagnosis entered by the treating emergency physician. A consensus gold standard is established by three independent board-certified emergency medicine specialists who blindly review each note and vote on the primary diagnosis using ICD-10 three-character codes; the majority vote (at least 2 of 3 specialists agreeing) constitutes the reference standard. Both LLMs are evaluated using a standardized zero-shot direct prompting strategy (temperature=0, stateless API sessions). The primary outcome is diagnostic accuracy (proportion of ICD-10 chapter-level matches) and Cohen's kappa for each LLM against the gold standard. Secondary outcomes include top-3 accuracy, treating physician accuracy, inter-model agreement, and subgroup analyses by ESI triage level and ICD-10 chapter. Inter-rater reliability among the three specialists is quantified using Fleiss' kappa. Analyses are performed in Jamovi. This study represents the first evaluation of LLM diagnostic accuracy using Turkish-language clinical notes and the first to benchmark LLM performance against an independent three-specialist majority-vote gold standard rather than against the treating physician's own diagnosis.

Official title: Diagnostic Accuracy of Large Language Models From Emergency Department Anamnesis Notes: A Comparison of GPT-4o and Claude 4.6 Sonnet With Emergency Medicine Specialists

Key Details

Gender

All

Age Range

18 Years - Any

Study Type

OBSERVATIONAL

Enrollment

600

Start Date

2026-06

Completion Date

2026-10

Last Updated

2026-06-25

Healthy Volunteers

No

Locations (1)

Marmara University Pendik Training and Research Hospital

Istanbul, Istanbul, Turkey (Türkiye)