Summarizing data

The major goals of statistics are to: (1) Summarize data, (2) Estimate uncertainty, (3) Test hypotheses, (4) Infer cause, and (5) Build models. Now that we can do things in R, we are ready to begin our journey through these goals. In this section, we focus on the first goal—Summarizing data.


It is somewhat weird to start with summarizing data without also describing uncertainty because, in the real world, data summaries should always be accompanied by some estimate of uncertainty. However, biting off both of these challenges at once is too much, so for now, as we move forward in summarizing data, remember that this is just the beginning and is inadequate on its own.

Even though we aren’t tackling uncertainty yet, summarizing data on its own is already incredibly useful. Understanding and interpreting summaries helps us find patterns, spot errors, and build towards deeper statistical analysis.

Why summarize data?

Summarizing data serves several purposes:

A nice picture of Clarkia's home.
Figure 1: A pretty scene of Clarkia’s home showing the world we get to summarize.

In this section, we’ll not only learn how to compute summaries but also how to think about them in a meaningful way. That means:

What’s Ahead?

Now we dive into descriptive statistics with R. While we will keep practicing and elaborating on what we have learned all term, the next six chapters, listed below, will get you started:

  • Data summaries – This section introduces univariate descriptions of data, including central tendency, variability, and shape. We also spend more time getting familiar with making and interpreting standard plots of univariate data.

  • Associations – We are often interested not just in one-dimensional summaries but also in how variables relate to one another. In this chapter, we introduce standard summaries for two-dimensional data.

  • Linear models – While we save intensive modeling for later, a model is, of course, a description of data. Here, we introduce the idea of a simple linear model. We spend much of the later part of this book revisiting this idea in more detail.

  • Ordination – We will not be making you holy. But we will help you deal with multivariate data, understand common approaches to reduce such complex data to a reasonable size, and recognize what to watch out for during this practice.

  • Better plots – As emphasized above, data visualization is the most powerful data summary. In this chapter, we carefully consider what makes a visualization effective—and what makes one misleading.

  • Better plots in R – After considering what makes a good plot, we will strengthen our ability to use ggplot to make great plots in R.