What I Did

  • Developed and optimized machine learning models (Logistic Regression, Decision Tree, Random Forest) to predict 30-day hospital readmission risk.

  • Identified critical predictors of readmission through feature importance analysis, uncovering patterns across disease categories such as heart disease, COPD, and cancer.

  • Processed and cleaned healthcare records, applying techniques such as feature extraction, imputation, one-hot encoding, and scaling to ensure high-quality, usable data.

  • Visualized model performance using confusion matrices, ROC-AUC curves, and feature importance, uncovering key predictors, and delivering insights to improve discharge planning and reduce readmission costs.

  • It is estimated that annual costs to readmit patients within 30 days of their previous discharge reach over 41.3 Billion USD. Due to this large cost, the hospitals across the United States have tried to implement the Hospital Readmissions Reduction Program, which was developed by the Centers for Medicare and Medicaid Services, with the aim of improving hospital readmission rates by imposing fines on hospitals that have high readmission rates.  It encourages hospitals to improve their relationships with patients and caregivers, to ensure that discharge plans can reduce avoidable readmissions. The program includes specific procedures and conditions in its analysis of readmission rates, including acute myocardial infarction, chronic obstructive pulmonary disease, heart failure, and pneumonia among others. Payment reduction and component results for the hospitals are calculated during a performance period, during which the number of readmissions are measured.

    Given that there is financial incentive to decrease readmission rates, hospitals want methods by which to decrease hospital readmissions. By using machine learning algorithms in order to determine which patients are at higher risk of readmission, and therefore require more attention in their discharge plan, hospitals can determine where to better allocate their resources. This, in turn, can lower the chances of readmission to the hospital, and thus lower costs for the hospital.

    There is a need for data analysis tools that can be used to determine the chance of readmission for patients in high risk of readmission categories. In order to be successful, the tool will need to be able to determine the changes of readmission within a 30 day window after the patient is initially discharged from the hospital.This is because readmission within a 30 day window is the most costly for hospitals. In turn, by determining which patients require extra discharge plans and information, hospitals can lower the costs associated with readmission on a larger basis.

Discussion

The Random Forest model emerged as the most effective, achieving an accuracy of 0.923 and an AUC of 0.9705. It demonstrated robust performance across various patient groups, particularly those with heart disease, COPD, cancer, and diabetes, due to its balanced feature importance distribution.