StatsNotebook

Reshaping data

27 Oct, 2020 | 3 minutes read

Tags: Data Management, R code


Follow our Facebook page or our developer’s Twitter for more tutorials and future updates.

The tutorial is based on R and StatsNotebook, a graphical interface for R.

This tutorial will provide step-by-step guide on reshaping the data from

  1. Wide to Long
  2. Long to Wide

We will use the built-in Human Development Index dataset in this example. This dataset can be loaded into StatsNotebook using instruction here. This is a data from the United Nations Development Programmes (UNDP).

This dataset contains the following variables

  1. Region - Region from which the Human Development Index is calculated
  2. HDI_1990, … , HDI_2018 - Human Development Index from 1990 to 2018
  3. GDP_1990, … , GDP_2018 - Gross Domestic Product per capita from 1990 to 2018

This dataset is in wide format because each row contains data from multiple years (e.g. from 1990 to 2018).

Wide data format

Many analyses require the data to be in long format (see below) such that each row represents data from one year.

Long data format

In StatsNotebook, the conversion from wide to long format can be done with a few clicks. However, the variable names need to be in the format “VariableName_Index”, such that the variable name is followed by an underscore and then an index (e.g. Measurement_1, Measurement_2, etc).

To convert the data from wide to long format,

  1. Click Data at the top
  2. Click Reshape
  3. Select Wide to Long in Operation
  4. Fill in the New Variable with the variable name for the index (e.g. Year)
  5. Select the target variables into Target variables (e.g. HDI_1990, …, HDI_2018 and GDP_1990, …, GDP_2018)
  6. Click Code and Run.

Wide to long instruction

currentDataset <- currentDataset %>% pivot_longer(c("HDI_1990", "HDI_1991", "HDI_1992", "HDI_1993", "HDI_1994", "HDI_1995", "HDI_1996", "HDI_1997", "HDI_1998", "HDI_1999", "HDI_2000", "HDI_2001", "HDI_2002", "HDI_2003", "HDI_2004", "HDI_2005", "HDI_2006", "HDI_2007", "HDI_2008", "HDI_2009", "HDI_2010", "HDI_2011", "HDI_2012", "HDI_2013", "HDI_2014", "HDI_2015", "HDI_2016", "HDI_2017", "HDI_2018", "GDP_1990", "GDP_1995", "GDP_2000", "GDP_2005", "GDP_2010", "GDP_2011", "GDP_2012", "GDP_2013", "GDP_2014", "GDP_2015", "GDP_2016", "GDP_2017", "GDP_2018"),
  names_to = c(".value", "Year"),
  names_pattern = "(.*)_(.*)")

"Chan, G. and StatsNotebook Team (2020). StatsNotebook. (Version 0.1.0) [Computer Software]. Retrieved from https://www.statsnotebook.io"
"R Core Team (2020). The R Project for Statistical Computing. [Computer software]. Retrieved from https://r-project.org"

We will use the long format Human Development Index dataset for this example. This dataset can be downloaded from here. This is a dataset from the United Nations Development Programmes and contains the following variables.

  1. Region - Region
  2. Year - Year
  3. HDI - Human Development Index
  4. GDP - Gross Domestic Product per capita

long data

To convert the data from long to wide format,

  1. Click Data at the top
  2. Click Reshape
  3. Select Long to Wide in Operation
  4. Select the index variable where the conversion will be based on (e.g. Year in this example)
  5. Select HDI and GDP to Target Variables.
  6. Click Code and Run.

long to wide instruction

currentDataset <- currentDataset %>% pivot_wider(names_from = Year,
  names_glue = "{.value}_{Year}",
  values_from = c("HDI", "GDP"))

"Chan, G. and StatsNotebook Team (2020). StatsNotebook. (Version 0.1.0) [Computer Software]. Retrieved from https://www.statsnotebook.io"
"R Core Team (2020). The R Project for Statistical Computing. [Computer software]. Retrieved from https://r-project.org"


The data will look like the following figure if the data is reshaped successfully.

Wide data format


Follow our Facebook page or our developer’s Twitter for more tutorials and future updates.