• 1. Getting started summary
Links to Summary. Chatbot tutor. Questions. Glossary. R functions. R packages. Additional resources.
Chapter summary
R is (much more than just) a simple calculator – it can keep track of variables, and has functions to make plots, summarize data, and build statistical models. R also has many packages that can extend its capabilities. Now that we are familiar with R, RStudio, vectors, functions, data types and packages, we are ready to build our R skills even further to work with data!
Chatbot tutor
Please interact with this custom chatbot (link here) I have made to help you with this chapter. I suggest interacting with at least ten back-and-forths to ramp up and then stopping when you feel like you got what you needed from it.
Practice Questions
The interactive R environment below allows you to work without switching tabs.
This is a floating-point precision issue. In R (and most programming languages), some decimal values cannot be represented exactly in the binary code that they use under the hood. To see this, try (0.2 + 0.1) - 0.3:
(0.2 + 0.1) - 0.3[1] 5.551115e-17
If you are worried about floating point errors, use the all.equal() function instead of ==, or round to 10 decimal places before asking logical questions.
For the following questions consider the diabetes dataset available at: https://rb.gy/fan785
Q4) Which variable in the diabetes dataset is a character but should be a number?:Q5) True OR False: The numeric variable, bp.1d, is a double, but could be changed to an integer without changing any of our analyses:
Q6) Which categorical variable in the dataset is ordinal?Q7) You collected five leaves of the wild grape (Vitis riparia) and measured their length and width. You have a table of lengths and widths of each leaf and a formula for grape leaf area (below).
The area of a grape leaf is: \[\text{leaf area } = 0.851 \times \text{ leaf length } \times \text{ leaf width}\] The data are here, each column is a leaf:
| length | 5.0 | 6.1 | 5.8 | 4.9 | 6.0 |
| width | 3.2 | 3.0 | 4.1 | 2.9 | 4.5 |
The mean leaf area is
First make vectors for length and width
length = c(5, 6.1, 5.8, 4.9, 6)
width = c(3.2, 3, 4.1, 2.9, 4.5)
Then multiply these vectors by each other and 0.851.
Finally find the mean
# Create length and width vectors
length <- c(5, 6.1, 5.8, 4.9, 6)
width <- c(3.2, 3, 4.1, 2.9, 4.5)leaf_areas <- 0.851 * length * width # find area
mean(leaf_areas) # find mean[1] 16.89916
# or in one step:
(0.851 * length * width) |>
mean()[1] 16.89916
Glossary of Terms
R: A programming language designed for statistical computing and data analysis.
RStudio: An Integrated Development Environment (IDE) that makes using R more user-friendly.
Vector: An ordered sequence of values of the same data type in R.
Assignment Operator (<-): Used to store a value in a variable.
Logical Operator: A symbol used to compare values and return TRUE or FALSE (e.g., ==, !=, >, <).
Numeric Variable: A variable that represents numbers, either as whole numbers (integers) or decimals (doubles).
Character Variable: A variable that stores text (e.g., "Clarkia xantiana").
Package: A collection of R functions and data sets that extend R’s capabilities.
New R functions
c(): Combines values into a vector.install.packages(): Installs an R package.library(): Loads an installed R package for use.log(): Computes the logarithm of a number, with an optional base.mean(): Calculates the average (mean) of a numeric vector.read_csv()(readr): Reads a CSV file into R as a data frame.round(): Rounds a number to a specified number of decimal places.sqrt(): Finds the square root of a number.View(): Opens a data frame in a spreadsheet-style viewer.
R Packages Introduced
base: The core R package that provides fundamental functions likec(),log(),sqrt(), andround().readr: A tidyverse package for reading rectangular data files (e.g.,read_csv()).dplyr: A tidyverse package for data manipulation, includingmutate(),glimpse(), andacross().conflicted: Helps resolve function name conflicts when multiple packages have functions with the same name.
Additional resources
These optional resources reinforce or go beyond what we have learned.
R Recipes:
Videos:
- Coding your Data Analysis for Success (From Stat454).
- Why use R? (Yaniv Talking).
- Accessing R and RStudio (Yaniv Talking).
- RStudio orientation (Yaniv Talking).
- R functions (Yaniv Talking).
- R packages (Yaniv Talking).
- Loading data into R (Yaniv Talking).
- Data types (Yaniv Talking). Uses compression data as an example.