StatsNotebook

Compute variables

16 Oct, 2020 | 2 minutes read

Tags: Data management, R code


Follow our Facebook page or our developer’s Twitter for more tutorials and future updates.

The tutorial is based on R and StatsNotebook, a graphical interface for R.

StatsNotebook provides a simple menu for creating new variables. We will give two examples below to demonstrate using StatsNotebook to compute new variables.

  1. Log transformation
  2. Creating a quadratic term

In these two examples, we will use the built-in Personality dataset. This dataset can be loaded into StatsNotebook using instruction here or can be downloaded from here.

Violation of distributional assumption is common. For example, linear regression requires that the residuals to be normally distribution. In our linear regression example, we regress depression on the Big Five personality factors and sex. The normality plot (QQ plot) from the regression model indicates that the residuals are not normally distributed. Below is the normality plot from that example.

Normality plot

This is largely because depression is positively skewed (a long tail on the right hand side).

Depression

One way to improve the model is to perform a log-transformation of the dependent variable, depression.

To create a log-transformed version of depression,

  1. Click Data at the top
  2. Click Compute
  3. In Target Variable, type in the name of the new variable (e.g. log_depression).
  4. Select Log (natural log) in the Function panel.
  5. Click on the variable we need to log-transform (e.g. Depression). This would automatically complete the formula in the formula panel.

Log transformation

  1. Click Code and Run. A new variable log_depression is now created.

The following code will be generated.

currentDataset$log_depression <- log(currentDataset$Depression)

Histogram of this new variable indicates that it is much less skewed.

Log transformed

The residual plot from the linear regression using the log-transformed depression variable shows little evidence of violating the normality assumption.

Normality

A quadratic term of an independent variable is often added to a regression model to test for curvilinear relationship. Suppose that we want to test if there is a curvilinear relationship between Depression and Neuroticism, we can create a quadratic term of Neruoticism and enter it into a linear regression model.

To create a quadratic term,

  1. Click Data at the top
  2. Click Compute
  3. In Target Variable, type the name of the new variable (e.g. Neuroticism_sq)
  4. Select Neuroticism from the Variable panel. The term currentDataset$Neuroticism will appear in the Formula panel
  5. Type ^2 at the end of the term

Quadratic term

  1. Click Code and Run. A new variable Neuroticism_sq will be created.

The following code will be generated.

currentDataset$Neuroticism_sq <- currentDataset$Neuroticism^2

Follow our Facebook page or our developer’s Twitter for more tutorials and future updates.