نبذة مختصرة : Background Clinical decision-making for percutaneous coronary intervention (PCI) in patients with moderate-to-severe coronary stenosis is complex and sensitive to data completeness and guideline interpretation. We aimed to evaluate large language models (LLMs) for PCI support and to develop an ensemble framework for this complex decision setting. Methods In this retrospective study, 15 LLM versions were evaluated using data of 93 patients from Ruijin Hospital. A hierarchical framework was employed to assess performance across varying data inputs. To optimize accuracy, advanced grouped ensemble strategies were developed and validated via nested repeated stratified 5-fold cross-validation. Probabilistic reliability and clinical utility were quantified through calibration plots and Decision Curve Analysis (DCA). Statistical robustness was ensured by bootstrap ROC-AUC comparisons with Holm-Bonferroni adjustment and restricted cubic spline modeling to analyze age-performance interactions. Results Distinct behavioral patterns emerged across LLM families: Llama-3.3-70B-Instruct made more aggressive recommendations, whereas Grok-3 was more conservative. Holm-adjusted analysis identified significant performance gaps at age cut-points of 73, 75, and 76. A significant age-score interaction (LRT p = 0.00089) confirmed that patient age modulates model performance. The advanced ensemble strategies surpassed individual models, with an adaptive grouped ensemble achieving an F1 score of 0.921, compared to 0.807 for the best single model and 0.794 for a standard ensemble. Conclusion Tailored LLM ensembles are feasible for PCI decision support and can improve robustness. Further multicenter prospective validation and multimodal integration are needed before clinical deployment.
No Comments.