StatsNotebook

Boxplot

20 Sep, 2020 | 4 minutes read

Tags: DataViz, R code


Follow our Facebook page or our developer’s Twitter for more tutorials and future updates.

The tutorial is based on R and StatsNotebook, a graphical interface for R.

Boxplot is commonly used to compare distribution of a numeric variables across several groups. StatsNotebook uses geom_boxplot() from ggplot2 to build boxplot.

We use the built-in alcohol dataset in this example. This dataset can be loaded into StatsNotebook using instruction here. This is a simulated data of alcohol consumption from 3666 individuals.

This dataset can also be loaded using the following codes

library(tidyverse)
currentDataset <- read_csv("https://statsnotebook.io/blog/data_management/example_data/alcohol.csv")

We will use the following three variables from this dataset

  1. alcohol - Number of standard drinks consumed in a month
  2. Remoteness - Capital city or regional area
  3. State - Seven states/territories in Australia: Queensland, New South Wales, Northern Territory, South Australia, Tasmania, Victoria and Western Australia.

In this example, we will build

Simple boxplot Simple boxplot
Boxplot (one grouping variable) Boxplot by one group
Boxplot (two grouping variables) Boxplot by two groups
Multiple boxplots Multiple boxplots by facets

To build a simple boxplot for a single numeric variable (e.g. alcohol),

  1. Click DataViz at the top
  2. Click Numeric
  3. Select Boxplot/ Violin plot from the menu
  4. In the Boxplot/ Violin plot panel, select alcohol to Vertical Axis.
  5. Click Code and Run.
currentDataset %>%
  ggplot(aes(y = alcohol)) +
  geom_boxplot(alpha = 0.6, na.rm=TRUE)+
    scale_fill_brewer(palette = "Set2")+
    scale_color_brewer(palette = "Set2")+
    theme(legend.position = "bottom")


"Chan, G. and StatsNotebook Team (2020). StatsNotebook. (Version 0.1.0) [Computer Software]. Retrieved from https://www.statsnotebook.io"
"R Core Team (2020). The R Project for Statistical Computing. [Computer software]. Retrieved from https://r-project.org"
"Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org"

Simple boxplot plot

To build a boxplot for a single numeric variable (e.g. alcohol) by groups (e.g. State),

  1. Click DataViz at the top
  2. Click Numeric
  3. Select Boxplot/ Violin plot from the menu
  4. In the boxplot panel, select alcohol to Vertical axis and select State to Horizontal aixs.
  5. Click Code and Run.
currentDataset %>%
  drop_na(State) %>%
  ggplot(aes(y = alcohol, x = State)) +
  geom_boxplot(alpha = 0.6, na.rm=TRUE)+
    scale_fill_brewer(palette = "Set2")+
    scale_color_brewer(palette = "Set2")+
    theme(legend.position = "bottom")


"Chan, G. and StatsNotebook Team (2020). StatsNotebook. (Version 0.1.0) [Computer Software]. Retrieved from https://www.statsnotebook.io"
"R Core Team (2020). The R Project for Statistical Computing. [Computer software]. Retrieved from https://r-project.org"
"Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org"

Boxplot by a grouping variable

To build a boxplot for a single numeric variable (e.g. alcohol) by two grouping variables (e.g. State and Year),

  1. Click DataViz at the top
  2. Click Numeric
  3. Select Boxplot/ Violin plot from the menu
  4. In the boxplot panel, select alcohol to Vertical axis, select State to Horizontal aixs and Year to Split by Fill color.
  5. Click Code and Run.
currentDataset %>%
  drop_na(State, Year) %>%
  ggplot(aes(y = alcohol, x = State, fill = Year)) +
  geom_boxplot(alpha = 0.6, na.rm=TRUE)+
    scale_fill_brewer(palette = "Set2")+
    scale_color_brewer(palette = "Set2")+
    theme(legend.position = "bottom")


"Chan, G. and StatsNotebook Team (2020). StatsNotebook. (Version 0.1.0) [Computer Software]. Retrieved from https://www.statsnotebook.io"
"R Core Team (2020). The R Project for Statistical Computing. [Computer software]. Retrieved from https://r-project.org"
"Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org"

Boxplot by two grouping variables

To build multiple boxplot for a single numeric variable (e.g. alcohol) by groups (e.g. States abd Year) in multiple facets (by another variable, e.g. Remoteness),

  1. Click DataViz at the top
  2. Click Numeric
  3. Select Boxplot from the menu
  4. In the boxplot panel, select alcohol to Vertical axis, State to Horizontal axis, Year to Split by: Fill Color and Remoteness to Facet.
  5. Click Code and Run.
currentDataset %>%
  drop_na(Remoteness, State) %>%
  ggplot(aes(x = alcohol, fill = Remoteness, color = Remoteness)) +
  geom_density(alpha = 0.2, na.rm = TRUE)+
    scale_fill_brewer(palette = "Set2")+
    scale_color_brewer(palette = "Set2")+
    facet_wrap( ~ State)+
    theme_bw(base_family = "sans")+
    theme(legend.position = "bottom")


"Chan, G. and StatsNotebook Team (2020). StatsNotebook. (Version 0.1.0) [Computer Software]. Retrieved from https://www.statsnotebook.io"
"R Core Team (2020). The R Project for Statistical Computing. [Computer software]. Retrieved from https://r-project.org"
"Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org"


Multiple boxplot plots by facet

Chan, G. and StatsNotebook Team (2020). StatsNotebook. [Computer Software]. Retrieved from https://www.statsnotebook.io
R Core Team (2020). The R Project for Statistical Computing. [Computer software]. Retrieved from https://r-project.org
Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org

Follow our Facebook page or our developer’s Twitter for more tutorials and future updates.