Diabetes: EDA & Modeling
EDA
ML
Health
Exploratory analysis and baseline models on a diabetes dataset with emphasis on feature hygiene and class balance.
Problem
Investigated factors associated with diabetes progression to inform preventative care.
Data
Used the publicly available dataset of patient measurements from the UCI repository.
Approach
- Cleaned outliers and imputed missing values
- Visualized relationships among clinical variables
- Trained logistic regression and random forest classifiers
Results
The tuned random forest reached 82% accuracy and highlighted BMI and glucose as key features.