Skip to content
  • Home
  • Terms and Conditions
  • Privacy Policy
  • Sitemap
  • Contact Us

Statistics Tutorials

Statistics Tutorials

  • Home
  • Statistics
  • Blog
  • About Us
  • Contact Us
  • Toggle search form

Multiple Linear Regression Analysis with Categorical Predictors

Posted on March 26, 2020April 23, 2020 By admin No Comments on Multiple Linear Regression Analysis with Categorical Predictors

In our previous post, we described to you how to handle the variables when there are categorical predictors in the regression equation. If you missed that, please read it from here. In this post, we will do the Multiple Linear Regression Analysis on our dataset. Also if you don’t have the dataset, please download it from here.

First let’s see some scatter plots to get an idea about the relationship between variables.

Given below is the scatterplot of charges vs age with the categorical variables “smoker” and “gender” as group variables

scatterplot of charges vs age with categorical variable groups

Here you can see there is a slight positive relationship between age and insurance charges. In fact, if you run a correlation analysis for the above data, you will get a correlation coefficient of 0.297. We can see smoking males and females have high insurance charges. Also, almost all the non-smoking males and females have insurance charges around 10,000$. Only one observation is above that limit.

Now let’s generate the scatterplot of charges vs BMI with the categorical variables “smoker” and “sex”

scatterplot of charges vs bmi with categorical variable groups

Here there is no relationship between BMI and charges. The data points are scattered everywhere. Correlation analysis outputs the correlation coefficient as 0.2.

The scatterplot below represents the relationship between age vs BMI.

scatterplot of bmi vs age

We can clearly see that there is no relationship between the two variables. All the data points are scattered everywhere. The correlation coefficient of 0.112 testifies our claim

Okay, now let’s jump into the Regression Analysis.

We first conduct Simple Linear Regression Analysis with each Independent variable with the Dependent Variable. Then we move on to the full regression model.

Since it takes so much space to display all our regression results, I have summarized the results in the following table

ModelF P-ValueSR-sq (adj)R-sq (Pred)
Age2.610.11811271.25.44%0%
bmi1.430.24311503.51.50%0%
bmi,age1.860.17611251.25.77%0%
bmi,sex1.580.22511356.84.00%0%
age,sex1.330.28211456.92.30%0%
age,smoke39.8805963.6973.53%64.03%
bmi,smoke20.9307445.2558.74%50.98%
bmi,smoke,age25.606078.9472.49%59.27%
bmi,age,sex1.280.303114212.91%0%
full model19.7106048.2972.71%57.10%

From the above table, you can see that Single Variable models are out of the question. Age, Smoke model is significant with the lowest regression error. It has the highest R-sq (adj) and R-sq (pred) values too. Those values are even better than the full regression model. Therefore we can conclude that the regression equation with age and smoke variables is the best model that fits the data.

Regression Analysis Tags:Minitab, Multiple Linear Regression

Post navigation

Previous Post: What are dummy variables in regression?
Next Post: Probability Mass Function

Leave a Reply Cancel reply

You must be logged in to post a comment.

Categories

  • Blog
  • Data Science
  • DOE
  • General
  • Probability
  • R Programming
  • Statistics
    • Probability Distributions
    • Regression Analysis
    • Time Series

Recent Posts

  • Randomized Incomplete Block Design
  • Why do Statistical Data Analysis?
  • Classical Method of Time Series Analysis
  • Time Series Analysis
  • Which software to use, Minitab, R or Python?

Tags

data cleaning (1) data visualization (1) Expected Value (2) Minitab (3) Multiple Linear Regression (3) Poisson (1) Probability (4) R (3) Random Variables (2) Regression Analysis (3) text analytics (1) TIme Series (1) wordcloud (1)

Recent Comments

  • admin on Which software to use, Minitab, R or Python?
  • Bunty Maskey on Which software to use, Minitab, R or Python?
  • Bunty Maskey on Which software to use, Minitab, R or Python?
  • admin on Which software to use, Minitab, R or Python?
  • Paul Alikado Sabuni on Which software to use, Minitab, R or Python?

Copyright © 2022 Statistics Tutorials.

Powered by PressBook WordPress theme

  • Facebook
  • Reddit

Terms and Conditions - Privacy Policy