Motivating scenario: you want to pick out a few columns to work with from a larger tibble.
Learning goals: By the end of this sub-chapter you should be able to
Use use the dplyr function, select(), to limit our data to a few variables of interest.
Figure 1: Using the select() function to retain specific columns from a dataset. The top table contains three columns: prop_hyb (proportion of hybrids), n_assayed (number of individuals assayed), and n_hyb (the computed number of hybrids). The select(prop_hyb, n_hyb) function is applied, keeping only the prop_hyb and n_hyb columns. The bottom table displays the resulting dataset after column selection.
The dataset above is not tiny – seventeen columns accompany the 593 rows of data. To simplify our lives, let’s use the dplyr function, select(), to limit our data to a few variables of interest:
location: The plant’s location. The pollinator visitation experiment was limited to two locations (either SR or GC), while the hybrid seed formation study was replicated at four locations (SR, GC, LB or US). This should be a <chr> (character), and it is!
prop_hybrid: The proportion of seeds genotypes that turned mean number of pollinator visits recorded (per fifteen minute pollinator observation) for that genotypes at that site. This should be a number <dbl> (double), and it is.
petal_area_mm: The area of the petals (in mm). This should be a number <dbl> (double), and it is!
asd_mm: The distance between anther (the place where pollen comes from) and stigma (the place that pollen goes to) on a flower. The smaller this number, the easier it is for a plant to pollinated itself. This should be a number <dbl> (double), and it is.
growth_rate: The variable we should have just fixed now it should be a number.
visited: A logical variable indicating if the plant received any visits at ail.