Clinical Research Directory
Browse clinical research sites, groups, and studies.
Integrating AI Predictions With Clinician Expertise
Sponsor: University of California, San Francisco
Summary
Optimizing the interaction between the human and the machine is a major topic when deploying artificial intelligence (AI) at the bedside. The goal of this randomized clinical vignette study is to learn if presenting AI model outputs via continuous Bayesian updates and/or uncertainty quantification can improve diagnostic accuracy and clinician trust in healthcare professionals (physicians, residents, fellows, physician assistants (PAs), and nurse practitioners (NPs)) from US academic institutions evaluating patients with chest pain or dyspnea. The main questions it aims to answer are: * Does presenting AI predictions as Bayesian-updated post-test probabilities improve diagnostic accuracy compared to standard predicted probabilities? * Does the addition of uncertainty quantification (95% confidence intervals) to AI predictions improve diagnostic accuracy? * Do these interventions (Bayesian updating and/or uncertainty quantification) help clinicians recover from the negative effects of intentionally misleading AI predictions? Comparison: Researchers will compare standard AI predicted probabilities (presented without uncertainty) to Bayesian-updated post-test probabilities and/or outputs containing 95% confidence intervals to see if the interventions improve diagnostic accuracy, clinician confidence, and resilience against misleading AI. Participants will: * Review 8 clinical vignettes (simulated patient cases) focusing on chest pain or dyspnea. * Provide an initial "pre-test" diagnostic probability for 5 possible diagnoses based on the clinical history alone. * View AI model outputs that vary by experimental condition (standard probability vs. Bayesian update, with or without uncertainty intervals, and accurate vs. misleading). * Provide an updated "post-test" diagnostic probability for the diagnoses after viewing the AI output. * Select and rank diagnostic tests and therapeutic steps for each vignette. Complete a post-survey regarding their trust in the AI, comfort with the data presentation, and demographics.
Official title: Transforming Clinical Decision Support Systems: Using Continuous Bayesian Updates to Integrate AI Predictions With Clinician Expertise
Key Details
Gender
All
Age Range
18 Years - Any
Study Type
INTERVENTIONAL
Enrollment
100
Start Date
2026-02
Completion Date
2026-12
Last Updated
2026-03-09
Healthy Volunteers
Yes
Conditions
Interventions
Bayesian-Updated Post-Test Probability
Rather than presenting the AI model's raw predicted probability, the system takes the clinician's pre-test probability (entered before seeing AI output) and applies a continuous likelihood ratio (CLR) derived from the AI model to calculate a Bayesian-updated post-test probability. The output is displayed as a shift from the clinician's own assessment (e.g., "Your assessment: 45% -\> Updated assessment: 72%"). The raw AI prediction is not shown. This approach mirrors how clinicians use diagnostic test results such as D-dimer to update pre-test probability of pulmonary embolism.
Standard AI Predicted Probability
AI model prediction is presented as a simple predicted probability (0-100%) for each of the possible diagnoses, together with the top 3 clinical features driving the prediction (e.g., "Acute Myocardial Infarction: 68% - Key factors: elevated troponin, ST-segment changes on ECG, chest pain radiation to left arm"). This represents the most common current approach to presenting AI-based diagnostic predictions in clinical settings.
Uncertainty Quantification (95% Confidence Interval)
The AI output (whether Bayesian-updated post-test probability or standard predicted probability) is presented together with a 95% confidence band displayed as error bars on probability bars. For accurate AI predictions, confidence interval width is approximately +/-12-15 percentage points. For misleading AI predictions, confidence intervals are widened by a factor of 1.5x (approximately +/-18-23 percentage points) to simulate reduced model confidence in unfamiliar or edge-case scenarios. Confidence intervals are constrained to the 0-100% range.