# Natural log of 1000
log(1000)
[1] 6.907755
Motivating scenario: you are getting up and running in R and want to get started with the two workhorses of working in R — functions (how R does stuff) and vectors (how R stores stuff).
Learning goals: By the end of this sub-chapter you should be able to
|>
operator.R comes with tons of built-in functions that do everything from basic math to advanced statistical modeling. So not only can we calculate the mean and variance in Clarkia xantiana petal lengths with the mean()
and var()
functions, respectively, but we can test the null hypothesis that mean petal size in xantiana is equal to that of parviflora with the t.test()
function. Functions are the foundation of how we do things in R – they save time and ensure consistency across your analyses.
Functions take arguments, which we put in parentheses. When typing sqrt(25)
, sqrt()
is the function, 25
is the argument, and 5
is the output.
# Natural log of 1000
log(1000)
[1] 6.907755
Functions can take multiple arguments: If you don’t specify them all, R will either tell you to provide them, or assumes default values. For example, the log
function defaults to the natural log (base e), so log(1000)
returns 6.908. If you want the logarithm with base 10, you need to specify it explicitly as log(1000, 10)
, which returns 3. The order of arguments matters—log(10, 1000)
gives 0.3333333, while log(1000, 10)
gives 3.
# Log base 1000 of 10
log(1000, base = 10)
[1] 3
For example, typing log(1000, base = 10) makes what each value represents obvious (improving code readability), and allows flexibility in argument order (e.g. log(base = 10, 1000) gives the same value as log(1000, base = 10)). Thus, using named arguments makes your code readable and robust.
When specifying arguments inside a function, always use = (e.g., log(1000, base = 10)). Do not use <-, which is for assigning values to variables. Otherwise, R might mistakenly store the argument as a variable, leading to unexpected results.
The pipe, |>, provides a clean way to pass the output of one function into another. For example, we can find the square root of the \(\text{log}_{10}\) of 1000, rounded to two decimal places, as follows:
log(1000, base = 10) |>
sqrt() |>
round(digits = 2)
[1] 1.73
Notice that we did not explicitly provide an argument to sqrt()
— it simply used the output of log(1000, base = 10)
. Similarly, the round()
function then rounded the square root of 3 to two decimal places.
If we observed one Clarkia plant with one flower, a second with two flowers, a third with three flowers, and a fourth with two flowers, we could find the mean number of flowers as (1 + 2 + 3 + 2)/4
= 2, but this would be tedious and error-prone. It would be easier to store these values in an ordered sequence of values (called a vector) and then use the (mean()
) function.
Vectors are the primary way that data is stored in R
—even more complex data structures are often built from vectors. We create vectors with the combine function, c()
, which takes arguments that are the values in the vector.
# A vector of flower numbers
# 1st plant has one flower
# 2nd plant has two flowers
# 3rd plant has three flowers
# 4th plant has two flowers
<- c(1, 2, 3, 2) # Create a vector for number of flowers per plant
num_flowers mean(num_flowers) # finding the mean flower number
[1] 2
# If each flower produces four petals
<- 4 * num_flowers
num_petals num_petals
[1] 4 8 12 8
# If we wanted the log_2 of petal number
log(num_petals, base = 2) |>
round(digits = 3)
[1] 2.000 3.000 3.585 3.000
Variable assignment can be optional: In the code, I assigned observations to the vector, num_flowers
, and then found the mean. But we could have skipped variable assignment—variable assignment — mean(c(1, 2, 3, 2))
also returns 2
.
There are two good reasons not to skip variable assignment:
Variable assignment makes code easier to understand. If I revisited my code in weeks I would know what the mean of this vector meant.
Variable assignment allows us to easily reuse the information For example, below I can easily find mean petal number.