top of page

SUMMARY:

​

Project Framework: Nearing the end of the semester, I cleaned up the overall framework for my project as I prepared for the Semester Presentations. I cut the CHD Risk Prediction / Correlation Analysis Step because of the difficulty of creating a novel formula or method to quantify a patient's risk as a percentage. This was done following an earlier General Audience Presentation wherein this step's novelty was called into question.

 

Coding: In this time, I synergistically added the EarlyStop and K-Fold Cross Validation overfitting prevention techniques, so my model was generalizable beyond my datasets alone. With those added overfitting techniques, I started work on a Dense Neural Network and Logistic Regression Model in Jupyter Notebook. I extracted the correlations of each feature on each other and the correlation of each feature on the output and created colormaps of these correlations. Although this part of my project was ultimately removed, I've included information and a graphic because it relates to the Binary Classification Step. After removing some unnecessary features and completing the rest of my preprocessing, my Dense Neural Network ended with 85% accuracy, requiring further hyperparameter tuning.

​

Datasets: As recommended by Dr.Borek Foldyna, I applied for the Multiethnic Study of Atherosclerosis from Mr.Writer's BioLINCC account, but Institutional Review Board or Ethics Committee Documentation was still mandated in order to gain access to the dataset. Independent of this request, I also applied for the Kaiser Permanente Hospital Readmission Prediction Dataset, where I later got access to the labels to the data (Demographics, Socioeconomic Status, Vital Signs at Admission), and subsequently the dataset. 

​

​

​

​

​

​

​

​

​

Progress Report 5: End of Semester 1 Coding / Datasets
 

2021-2022 Academy of Science Research Portfolio

bottom of page