Creating Box Plots in R ggplot2

create box plot using ggplot2

Introduction to Box Plots

Box Plots (also known as Box and Whisker and Diagram) are used to get a good visual idea about the distribution of data and spot outliers. In this post, we will be creating attractive and informative box plots using ggplot2 package that comes with R.

A box plot takes the following form;

box plot illustration explanation example

We have marked the structure of a box plot in the above illustration. You can clearly spot the outliers and the quartiles.

Now let’s create box plots using ggplot2 package.

Box Plot for a Quantitative Variable

Here we are using the “ChickWeight” dataset that comes with RStudio.

A glimpse of the dataset is given below

> head(ChickWeight)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1

Let’s check the data type of each column

> sapply(ChickWeight,class)
$weight
[1] "numeric"

$Time
[1] "numeric"

$Chick
[1] "ordered" "factor" 

$Diet
[1] "factor"

It appears that we have two quantitative variables and two categorical (factor) variables.

Basic Box Plot

Keeping that in mind, lets plot a box plot for the “weight” variable using ggplot2.

ggplot(ChickWeight, aes(y = weight)) + 
  geom_boxplot()+ggtitle("Box Plot of Weight")

The ‘geom_boxplot’ function creates the box plot and ‘ggtitle’ function puts a title to the box plot.

box plot using r ggplot2

Here you can see that the median is approximately 100 and you can spot some outliers as well.

Showing Outliers

We can specify the outlier display styles as below;

ggplot(ChickWeight, aes(y=weight)) + 
  geom_boxplot(outlier.colour = "red", outlier.shape = 8, outlier.size = 2)

In the above code, I have used ‘red’ colour to represent outliers and shape no. 8 as the symbol for the outlier. There are many shapes you can choose from and R represents each shape by a number (a star is represented by number 8). You can find more information about shapes in R from this link. The ‘outlier.size’ argument stands for the size of the outlier shape. In fact, all these arguments are pretty intuitive.

show outliers in R boxplot ggplot2

See how clear it is! You can clearly spot the outliers as they are marked in red star marks.

Changing the fill color

If you want to change the fill color of the box plot, type the following code in R

ggplot(ChickWeight, aes(y=weight)) +
geom_boxplot(outlier.colour = "red", outlier.shape = 8, outlier.size = 2, fill='#00a86b', colour='black')

The above function contains 2 new arguments namely ‘fill’ and ‘colour’. The ‘fill’ argument defines the colour inside the box or the fill colour. The argument ‘colour’ defines the outline colour of the box. In this case, it is set to black.

change fill color of boxplot in r

Here we have used a hex colour code as the fill colour. You can use any colour you like in the form of hexcode or choose one from the R default colours. Or you can type colors() in R Studio console to get the list of colours available in R.

Box Plot when Variables are Categorical

Often times, you have categorical columns in your data set. ggplot2 generates aesthetically appealing box plots for categorical variables too. And it is the same way you defined a box plot for a quantitative variable.

ggplot(ChickWeight, aes(x=Diet, y=weight)) +
  geom_boxplot() + ggtitle("Box Plot of Weight with Diet Type")
box plot for categorical data in r ggplot2

Here you can see how the weight is distributed according to the diet categories.

Box Plot will with different colors

We can also fill our box plot according to the categories of the ‘diet’ variable.

ggplot(ChickWeight, aes(x=Diet, y=weight, fill=Diet)) +
  geom_boxplot() + ggtitle("Box Plot of Weight with Diet Type")

In the above code, I have added a new argument to the ‘ggplot’ function. The ‘fill’ argument defines the fill of the box plot.

box plot for categorical data - fill color according to categories

WOW! right? See how beautiful it is! Plot is nicely categorized according to a color scheme and everything is very appealing. You also get a legend telling which color belongs to which category.

Fill Box Plot with Color Brewer Palette

If you are not satisfied with the colours associated with categories, you can use ‘color brewer’ palettes to define a color scheme. We use the scale_fill_brewer() function for this.


ggplot(ChickWeight, aes(x=Diet, y=weight, fill=Diet)) +
  geom_boxplot() + ggtitle("Box Plot of Weight with Diet Type") +
  scale_fill_brewer(palette = 'YlGnBU')

Please refer the brewer color palettes from this link.

box plot for categorical data - fill color using color brewer palettes

Here I have shades of green to categorize my box plot.

Adding a theme to a Box Plot

In ggplot2, we can even add a theme for our plots. Here’s how you add a theme to our box plot.

ggplot(ChickWeight, aes(x=Diet, y=weight, fill=Diet)) +
  geom_boxplot() + ggtitle("Box Plot of Weight with Diet Type") +
  scale_fill_brewer(palette = 'YlGnBU') + theme_light()

Here, the ‘theme_light()’ function does the trick.

create box plot in r using ggplot2 - box plot with theme

There are several types of themes in ggplot2.

theme types in r ggplot2

You can choose the type of theme by typing the theme name after the underscore.

Ex : if you want to choose the ‘minimal’ theme, you may type theme_minimal()

Display means in the Box Plot

Sometimes, it is convenient to show the mean of the distribution in the box plot. We use the function stat_summary for that.

ggplot(ChickWeight, aes(x=Diet, y=weight, fill=Diet)) +
  geom_boxplot() + ggtitle("Box Plot of Weight with diet Type") + 
  stat_summary(fun=mean, geom="point", shape=17, colour='red', size=4)
displaying means in a box plot ggplot2

In the above box plot, the red colored triangle display the category means.

Horizontal Box Plot

We can create a Horizontal Box Plot using ggplot2 with the coord_flip function.

ggplot(ChickWeight, aes(x=Diet, y=weight, fill=Diet)) +
  geom_boxplot() + coord_flip()
horizontal box plot using ggplot2

Conclusion

Okay, now we have come to the end of this post. We have covered almost all the important aspects of creating box plots using ggplot2. Feel free to comment if you come across any difficulty.

Cheers!


References

  • http://www.sthda.com/english/wiki/ggplot2-box-plot-quick-start-guide-r-software-and-data-visualization

Similar Posts Like This Post;

Creating a Histogram with R

3 thoughts on “Creating Box Plots in R ggplot2”

Leave a Comment