# We do this the first time we need a package.
install.packages(c("readr", "dplyr", "conflicted"))
• 1. Load packages and data
Motivating scenario: you would like to get started having a peek into your data in R, and want to load the data and the packages to get started.
Learning goals: By the end of this sub-chapter you should be able to
- Understand what an R package is
- Install an R package with the
install.packages()
function (the first time you use it).
- Load an R package with the
library()
function (in every R session in which you use the package).
- Install an R package with the
- Use the
read_csv()
function in thereadr
package to load a file from the internet into R.
R packages
While R
has many built-in functions, packages provide even more functions to extend R
’s capabilities. Packages can offer alternative (often more efficient and user-friendly) approaches to tasks that can be done with base R
functions, or they can enable entirely new functionality that is not included in base R
at all. In fact, R
packages are a major way that the latest statistical and computational methods in various fields are shared with practitioners.
Below I introduce the readr
, and dplyr
packages. Because these packages are so useful for streamlining data import, manipulation, and cleaning, I use them in nearly every R project. I also introduce the conflicted
package, which identifies any functions with shared names across packages, and allows us to tell R
which function we mean when more than one function has the same name.
Install a package the first time you use it The first time you need a package, install it with the install.packages() function. Here the argument is the package (or vector of packages) you want to install. So, to install the packages above, type:
Load installed packages every time you open RStudio You only install a package once, but you must use the library()
function, as I demonstrate below, to load installed packages every time you open R.
# We do this every time we open R and want to use these packages.
library(conflicted)
library(readr)
library(dplyr)
Reading data into R
Rather than typing large datasets into R, we usually want to read in data that is already stored somewhere. For now, we will load data saved as a csv file from the internet with the read_csv(link)
structure from the readr
package. Later, we will revisit the challenge of importing data from other file types and locations into R.
Loading data: See posit's recipe for importing data for more detail. Note also that read.csv()
is a base R function similar to read_csv()
, but it behaves a bit differently – for example it reads data in as a dataframe, not a tibble.
Below, I show an example of reading pollinator visitation data from a link on my GitHub. After loading a dataset, you can see the first ten lines and all the columns that fit by simply typing its name. Alternatively, the View()
function opens up the full spreadsheet for you to peruse.
<- "https://raw.githubusercontent.com/ybrandvain/datasets/refs/heads/master/clarkia_rils.csv"
ril_link <- readr::read_csv(ril_link)
ril_data ril_data
# A tibble: 593 × 17
ril location prop_hybrid mean_visits growth_rate petal_color petal_area_mm
<chr> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 A1 GC 0 0 1.272 white 44.0
2 A100 GC 0.125 0.188 1.448 pink 55.8
3 A102 GC 0.25 0.25 1.8O pink 51.7
4 A104 GC 0 0 0.816 white 57.3
5 A106 GC 0 0 0.728 white 68.6
6 A107 GC 0.125 0 1.764 pink 66.3
7 A108 GC NA NA 1.584 <NA> 51.5
8 A109 GC 0 0 1.476 white 48.1
9 A111 GC 0 NA 1.144 white 51.6
10 A112 GC 0.25 0 1 white 89.8
# ℹ 583 more rows
# ℹ 10 more variables: date_first_flw <dbl>, node_first_flw <dbl>,
# petal_perim_mm <dbl>, asd_mm <dbl>, protandry <dbl>, stem_dia_mm <dbl>,
# lwc <dbl>, crossDir <chr>, num_hybrid <dbl>, offspring_genotyped <dbl>
package::function()
format: I read in the data with the read_csv()
function in the readr package by typing: readr::read_csv()
, but typing read_csv()
gives the same result. The package::function()
format comes in handy when two functions in different packages have the same name.