Framingham Heart Study Dataset
The dataset that I’m using is the Framingham Heart Study (FHS) Dataset from the NHLBI or BioLinCC website.
​
Study Design: In 1948, investigators sampled 2 out of 3 adults in the town of Framingham Massachusetts, yielding a sample of 5209 men and women.
​
​
​
​
​
​

Image of the .csv file (Open in New Tab)
​
Above is a portion of what the .csv file of the dataset looks like, with less important feature columns not included.
The CVD column is the dependent variable vector, which provides a binary value of 0 or 1 depending on whether or not the participant had Coronary Heart Disease.
​
Dataset Characteristics:
​
-
39 variables, 38 IV and 1 DV.
-
11,627 Instances of Participant Data (Patients re-tested after 10+ years had passed)
Feature Examples:
​
-
Sex (0/1)
-
Age (28-74)
-
Smoking Frequency
-
Cholesterol
-
BMI
-
RANDID (Unique ID Number for Each Participant)
​
Dataset Specifics:

Outputs:
​
PREVCHD: Prevalent Coronary Heart Disease
PREVAP: Prevalent Angina Pectoris at Exam
PREVMI: Prevalent Myocardial Infarction
PREVSTRK: Prevalent Stroke
PREVHYP: Prevalent Hypertensive
*PREVCHD was chosen because it contained the largest split between positive and negative CHD.