• 1. Load packages and data

Motivating scenario: you would like to get started having a peek into your data in R, and want to load the data and the packages to get started.

Learning goals: By the end of this sub-chapter you should be able to

  1. Understand what an R package is
    • Install an R package with the install.packages() function (the first time you use it).
    • Load an R package with the library() function (in every R session in which you use the package).
  2. Use the read_csv() function in the readr package to load a file from the internet into R.

R packages

While R has many built-in functions, packages provide even more functions to extend R’s capabilities. Packages can offer alternative (often more efficient and user-friendly) approaches to tasks that can be done with base R functions, or they can enable entirely new functionality that is not included in base R at all. In fact, R packages are a major way that the latest statistical and computational methods in various fields are shared with practitioners.

Below I introduce the readr, and dplyr packages. Because these packages are so useful for streamlining data import, manipulation, and cleaning, I use them in nearly every R project. I also introduce the conflicted package, which identifies any functions with shared names across packages, and allows us to tell R which function we mean when more than one function has the same name.

Install a package the first time you use it The first time you need a package, install it with the install.packages() function. Here the argument is the package (or vector of packages) you want to install. So, to install the packages above, type:

# We do this the first time we need a package.
install.packages(c("readr", "dplyr", "conflicted"))

Load installed packages every time you open RStudio You only install a package once, but you must use the library() function, as I demonstrate below, to load installed packages every time you open R.

# We do this every time we open R and want to use these packages.
library(conflicted)
library(readr)
library(dplyr)

Reading data into R

Rather than typing large datasets into R, we usually want to read in data that is already stored somewhere. For now, we will load data saved as a csv file from the internet with the read_csv(link) structure from the readr package. Later, we will revisit the challenge of importing data from other file types and locations into R.

Loading data: See posit's recipe for importing data for more detail. Note also that read.csv() is a base R function similar to read_csv(), but it behaves a bit differently – for example it reads data in as a dataframe, not a tibble.

Below, I show an example of reading pollinator visitation data from a link on my GitHub. After loading a dataset, you can see the first ten lines and all the columns that fit by simply typing its name. Alternatively, the View() function opens up the full spreadsheet for you to peruse.

ril_link <- "https://raw.githubusercontent.com/ybrandvain/datasets/refs/heads/master/clarkia_rils.csv"
ril_data <- readr::read_csv(ril_link)
ril_data
# A tibble: 593 × 17
   ril   location prop_hybrid mean_visits growth_rate petal_color petal_area_mm
   <chr> <chr>          <dbl>       <dbl> <chr>       <chr>               <dbl>
 1 A1    GC             0           0     1.272       white                44.0
 2 A100  GC             0.125       0.188 1.448       pink                 55.8
 3 A102  GC             0.25        0.25  1.8O        pink                 51.7
 4 A104  GC             0           0     0.816       white                57.3
 5 A106  GC             0           0     0.728       white                68.6
 6 A107  GC             0.125       0     1.764       pink                 66.3
 7 A108  GC            NA          NA     1.584       <NA>                 51.5
 8 A109  GC             0           0     1.476       white                48.1
 9 A111  GC             0          NA     1.144       white                51.6
10 A112  GC             0.25        0     1           white                89.8
# ℹ 583 more rows
# ℹ 10 more variables: date_first_flw <dbl>, node_first_flw <dbl>,
#   petal_perim_mm <dbl>, asd_mm <dbl>, protandry <dbl>, stem_dia_mm <dbl>,
#   lwc <dbl>, crossDir <chr>, num_hybrid <dbl>, offspring_genotyped <dbl>

package::function() format: I read in the data with the read_csv() function in the readr package by typing: readr::read_csv(), but typing read_csv() gives the same result. The package::function() format comes in handy when two functions in different packages have the same name.