In a typical ICU, patients are grouped by their chronic conditions., such as heart failure, COPD, diabetes. But within those groups, costs and outcomes vary enormously. Two patients with the same comorbidity burden can have median hospital costs of $9,000 and $23,000.
Using the SUPPORT-II cohort (9,105 ICU admissions), I built an unsupervised learning pipeline that first stratifies patients by chronic disease burden, then discovers distinct acute phenotypes within each group, separating, for example, a low-acuity cancer patient from a neurologic catastrophe patient, even when both have zero recorded comorbidities.
Standard risk adjustment groups patients by comorbidity count or DRG. This analysis shows that within the same comorbidity tier, acute presentation patterns create 2–3× cost variation that comorbidity-based models miss. The phenotypes are identifiable from admission-day data (vitals, labs, demographics) and can be assigned using a simple decision tree — no complex model needed at the point of care.
Each phenotype was independently validated against 365-day mortality (Cox PH, adjusted for age, sex, comorbidity count, and acute severity). Hazard ratios ranged from 0.51 (protective) to 2.36 (high risk) relative to stratum references. A surrogate decision tree distils the phenotype assignments into auditable rules with 3–4 conditions each.
Built on the SUPPORT-II cohort (Vanderbilt University). Pipeline includes multi-view similarity network fusion, bootstrap stability validation, cross-validated prognostic assessment, and LLM-assisted rule translation with programmatic QC.
Full methodology →
GitHub →