top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Advanced Data Analytics with Excel

Project type

Analytics

Date

May 2025

Project Overview:
This project focused on identifying key predictors of heart disease using real-world clinical data from the UCI Heart Disease dataset. Using Excel’s advanced analysis toolkit, the project involved exploratory data cleaning, statistical summaries, correlation matrices, and regression attempts. The objective was to assess which patient health indicators most strongly correlate with the presence of heart disease, to emulate how data-driven decision-making supports preventive healthcare.

Key Analysis Techniques:
1. Descriptive Statistics
Analyzed central tendencies and variability in variables like age, cholesterol, heart rate, and ST depression.
2. Correlation Analysis
Created a correlation matrix to identify relationships between variables and the binary target (num_dummy).
• oldpeak showed a strong positive correlation (0.70) with heart disease.
• thalach and age were weakly to moderately associated.
• cholesterol and resting BP had negligible predictive value.
3. Regression Attempts
Tried to model the target variable using linear regression via Excel's Data Analysis Toolpak.
Challenges with Excel’s handling of array size and residuals limited final output, though variable influence direction was clear.
4. Data Cleaning
Handled missing values, converted text to numerical form, and excluded irrelevant or overly generalized features (e.g., chest pain type was excluded as all chest pain was treated as risky).

Visuals Created:
• Scatter plot: oldpeak vs. heart disease
• Scatter plot: thalach vs. heart disease
• Correlation heatmap (manually designed)

Key Insights:
• ST depression (oldpeak) is a significant red flag in heart health and deserves diagnostic priority.
• Max heart rate (thalach) and age support risk assessment but are secondary factors.
• Not all commonly assumed indicators (like cholesterol) have predictive value in every dataset.

bottom of page