Tundra Space

Tundra Space

Clinical Research Directory

Browse clinical research sites, groups, and studies.

Back to Studies
NOT YET RECRUITING
NCT07626060

Diagnostic Accuracy of GPT-4o and Claude for HEART Score Calculation in Chest Pain

Sponsor: Marmara University Pendik Training and Research Hospital

View on ClinicalTrials.gov

Summary

This prospective observational diagnostic accuracy study evaluates whether large language models (LLMs) - GPT-4o (OpenAI, gpt-4o-2024-11-20) and Claude (Anthropic, claude-sonnet-4-6) - can accurately calculate HEART scores from unstructured Turkish clinical notes and predict 30-day major adverse cardiac events (MACE) in emergency department patients presenting with non-traumatic chest pain. The study will enroll 600 consecutive adult patients. For each patient, the same anonymized data (free-text anamnesis, ECG report text, troponin value, and age) will be independently processed by both LLMs via separate API calls with deterministic settings (temperature=0, JSON format). A three-expert consensus HEART score - derived through blinded independent scoring by three emergency medicine physicians with majority-vote adjudication - serves as the reference standard for agreement analysis. Actual 30-day MACE (all-cause death, AMI Type 1/2/4b, unplanned revascularization) determined via national health database and telephone follow-up serves as the outcome for diagnostic accuracy analysis. A secondary documentation-quality sub-study will quantify how spontaneously Turkish emergency anamnesis notes capture HEART score parameters.

Official title: Diagnostic Accuracy of Large Language Models (GPT-4o and Claude) in HEART Score Calculation and 30-Day MACE Prediction in Emergency Department Chest Pain Patients: A Prospective Observational Validation Study Against Three-Expert Consensus

Key Details

Gender

All

Age Range

18 Years - Any

Study Type

OBSERVATIONAL

Enrollment

690

Start Date

2026-06

Completion Date

2027-06

Last Updated

2026-06-04

Healthy Volunteers

No

Interventions

OTHER

GPT-4o HEART Score Calculator

OpenAI GPT-4o (model: gpt-4o-2024-11-20, temperature=0, max\_tokens=500, response\_format=JSON). Each patient's anonymized anamnesis text, ECG report text, troponin value, and age are submitted via a separate API call with no conversation history. Output: HEART score components (0-2 each), total score (0-10), risk group, and indeterminate status.

OTHER

Claude HEART Score Calculator

Anthropic Claude (model: claude-sonnet-4-6, temperature=0, max\_tokens=500, response\_format=JSON). Identical system prompt and input format as GPT-4o. Processed independently with no cross-contamination between models. Output: same JSON schema as GPT-4o.

OTHER

Three-Expert Consensus HEART Score

Three emergency medicine physicians (\>=3 years experience, HEART-score trained) independently score each anonymized record. Majority vote (2/3) determines component scores; a 4th adjudicator resolves ties. Experts are blinded to LLM scores, each other's scores, and MACE outcomes.

Locations (1)

Marmara University Pendik Training and Research Hospital

Istanbul, İ̇stanbul, Turkey (Türkiye)