• 2. Adding columns w mutate

Motivating scenario: you want to change or add a column in a tibble.

Learning goals: By the end of this sub-chapter you should be able to

  1. Add or change a column with the mutate() function in the dplyr package.
  2. Change between variable types in a column.
  3. Change values conditionally with the case_when() function.
A visual representation of a data transformation using `mutate()` in `dplyr`. The top table contains two columns: `prop_hyb` (proportion of hybrids) and `n_assayed` (number of individuals assayed), with values showing different proportions and a constant sample size of 8. Below, an R code snippet applies `mutate(n_hyb = prop_hyb * n_assayed)`, generating a new column, `n_hyb`, which contains the computed number of hybrids (0, 1, and 2, respectively). The updated dataset is displayed in a bottom table with the new `n_hyb` column highlighted in a darker shade.
Figure 1: An illustration of the mutate() function. The top table represents the original dataset, containing columns for the proportion of hybrids (prop_hyb) and the number of individuals assayed (n_assayed). The mutate() function is then applied to compute n_hyb, the total number of hybrid individuals, by multiplying prop_hyb by n_assayed. The resulting dataset, shown in the bottom table, includes this newly created n_hyb column.

Changing or adding variables with mutate()

Often we want to change the values in a column, or make a new column. For example in our data we my hope to:

  • Convert growth_rate into a number, with the as.numeric() function.
  • Add the logical variable, visited, which is TRUE if a plant had more than zero pollinators visit them, and is FALSE otherwise.

The mutate() function in the dplyr package can solve this. You can overwrite data in an existing column or make a new column as follows:

ril_data      |>
  dplyr::mutate(growth_rate = as.numeric(growth_rate),   # make numeric
                visited = mean_visits > 0)
Warning: There was 1 warning in `dplyr::mutate()`.
ℹ In argument: `growth_rate = as.numeric(growth_rate)`.
Caused by warning:
! NAs introduced by coercion
# A tibble: 593 × 18
   ril   location prop_hybrid mean_visits growth_rate petal_color petal_area_mm
   <chr> <chr>          <dbl>       <dbl>       <dbl> <chr>               <dbl>
 1 A1    GC             0           0           1.27  white                44.0
 2 A100  GC             0.125       0.188       1.45  pink                 55.8
 3 A102  GC             0.25        0.25       NA     pink                 51.7
 4 A104  GC             0           0           0.816 white                57.3
 5 A106  GC             0           0           0.728 white                68.6
 6 A107  GC             0.125       0           1.76  pink                 66.3
 7 A108  GC            NA          NA           1.58  <NA>                 51.5
 8 A109  GC             0           0           1.48  white                48.1
 9 A111  GC             0          NA           1.14  white                51.6
10 A112  GC             0.25        0           1     white                89.8
# ℹ 583 more rows
# ℹ 11 more variables: date_first_flw <dbl>, node_first_flw <dbl>,
#   petal_perim_mm <dbl>, asd_mm <dbl>, protandry <dbl>, stem_dia_mm <dbl>,
#   lwc <dbl>, crossDir <chr>, num_hybrid <dbl>, offspring_genotyped <dbl>,
#   visited <lgl>

Warning… ! NAs introduced by coercion:

You can see that R gave us a warning. Warnings do not mean that something necessarily went wrong, but they do mean we should look and see what happened. In this case, we see that when trying to change the character string, 1.8O, into a number R did not know what to do and converted it to NA. In the next bit of code I convert it into "1.80" with the case_when() function.

Again, now that we see that this seemed to work we can assign it to R’s memory: In doing so, I even converted 1.8O into 1.80 so we have an observation there rather than missing data.

ril_data       <- ril_data      |>
  dplyr::mutate(growth_rate = case_when(growth_rate =="1.8O" ~ "1.80",
                                          .default = growth_rate),  
                growth_rate = as.numeric(growth_rate),
                visited = mean_visits > 0)
# A tibble: 593 × 18
   ril   location prop_hybrid mean_visits growth_rate petal_color petal_area_mm
   <chr> <chr>          <dbl>       <dbl>       <dbl> <chr>               <dbl>
 1 A1    GC             0           0           1.27  white                44.0
 2 A100  GC             0.125       0.188       1.45  pink                 55.8
 3 A102  GC             0.25        0.25        1.8   pink                 51.7
 4 A104  GC             0           0           0.816 white                57.3
 5 A106  GC             0           0           0.728 white                68.6
 6 A107  GC             0.125       0           1.76  pink                 66.3
 7 A108  GC            NA          NA           1.58  <NA>                 51.5
 8 A109  GC             0           0           1.48  white                48.1
 9 A111  GC             0          NA           1.14  white                51.6
10 A112  GC             0.25        0           1     white                89.8
# ℹ 583 more rows
# ℹ 11 more variables: date_first_flw <dbl>, node_first_flw <dbl>,
#   petal_perim_mm <dbl>, asd_mm <dbl>, protandry <dbl>, stem_dia_mm <dbl>,
#   lwc <dbl>, crossDir <chr>, num_hybrid <dbl>, offspring_genotyped <dbl>,
#   visited <lgl>

When I was trying to change the character “1.8O” into the 1.80, R kept saying: Error in dplyr::mutate()… Caused by error in case_when(): ! Can’t combine ..1 (right) and ..2 (right) . Unlike warnings, which tell you to watch out, errors tell you R cannot do what you’re asking of it. It turns out that I could not assign the number 1.80 to the vector held in petal_area_mm because I could not blend characters ad numbers. So, as you can see, I replaced "1.8O" with "1.80", and then I used as.numeric() to convert the vector to numeric.