top of page

Framingham Heart Study Dataset

 

The dataset that I’m using is the Framingham Heart Study (FHS) Dataset from the NHLBI or BioLinCC website.

​

Study Design: In 1948, investigators sampled 2 out of 3 adults in the town of Framingham Massachusetts, yielding a sample of 5209 men and women. 

​

​

​

​

​

​

dataset.png
Image of the .csv file (Open in New Tab)
​

Above is a portion of what the .csv file of the dataset looks like, with less important feature columns not included.

The CVD column is the dependent variable vector, which provides a binary value of 0 or 1 depending on whether or not the participant had Coronary Heart Disease.

​

Dataset Characteristics:

​

  • 39 variables, 38 IV and 1 DV.

  • 11,627 Instances of Participant Data (Patients re-tested after 10+ years had passed)

Feature Examples:

​

  • Sex (0/1)

  • Age (28-74)

  • Smoking Frequency

  • Cholesterol

  • BMI

  • RANDID (Unique ID Number for Each Participant)

​

Dataset Specifics:
Outputs:

​

PREVCHD: Prevalent Coronary Heart Disease

PREVAP: Prevalent Angina Pectoris at Exam

PREVMI: Prevalent Myocardial Infarction

PREVSTRK: Prevalent Stroke

PREVHYP: Prevalent Hypertensive

*PREVCHD was chosen because it contained the largest split between positive and negative CHD.

2021-2022 Academy of Science Research Portfolio

bottom of page