Phase I: Preprocessing and Binary Classifier Model Comparison
Phase 1 & 2: Preprocessing and Binary Classifier Model Comparison
​
The skills/information used in researching this project as well as the different steps were inspired by the meeting with the Harvard ML Researcher and the A-Z Machine Learning Courses over the course of the summer.
​
The first step of my project is to normalize, replace null values, and standardize the values in the columns of my dataset. This is because the features have drastically different numbers, some of them are binary, others range from 1-4, and others are numerical values that can reach 500+. Additionally, some columns are missing values because they weren’t recorded on that particular check.

From there, the next step is to run risk binary classifier models like the Dense Neural Network and Logistic Regression on my dataset to get a binary yes or no value indicating whether a patient with certain metrics or clinical characteristics is at risk of contracting Coronary Heart Disease in the Next 10 Years.

After running the risk analysis models, I intend to test the model’s accuracy with the testing set and the R^2 value. From those models, I also intend to calculate correlations on each feature to the target using a correlation plot. The next step is to compare those models using their R^2 value as well as their positive and negative predictive values to discard the models with the lowest performances or weakest power in relating the features to the target.

PPV and NPV are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV describe the performance of a diagnostic test or other statistical measures.