top of page

Framingham Heart Study Dataset

 

The dataset that I’m using is the Framingham Heart Study (FHS) Dataset from the NHLBI or BioLinCC website.

Study Design: In 1948, investigators sampled 2 out of 3 adults in the town of Framingham Massachusetts, yielding a sample of 5209 men and women. 

dataset.png
Image of the .csv file (Open in New Tab)

Above is a portion of what the .csv file of the dataset looks like, with less important feature columns not included.

The CVD column is the dependent variable vector, which provides a binary value of 0 or 1 depending on whether or not the participant had Coronary Heart Disease.

Dataset Characteristics:

  • 39 variables, 38 IV and 1 DV.

  • 11,627 Instances of Participant Data (Patients re-tested after 10+ years had passed)

Feature Examples:

  • Sex (0/1)

  • Age (28-74)

  • Smoking Frequency

  • Cholesterol

  • BMI

  • RANDID (Unique ID Number for Each Participant)

Dataset Specifics:
Outputs:

PREVCHD: Prevalent Coronary Heart Disease

PREVAP: Prevalent Angina Pectoris at Exam

PREVMI: Prevalent Myocardial Infarction

PREVSTRK: Prevalent Stroke

PREVHYP: Prevalent Hypertensive

*PREVCHD was chosen because it contained the largest split between positive and negative CHD.

2021-2022 Academy of Science Research Portfolio

bottom of page