4 Reproducible Science
This section includes background on:
Motivating scenario: You are conducting a scientific research project and want to make sure that your project is reliable and repeatable.
Learning goals: By the end of this chapter you should be able to
- Make a data sheet.
- Label samples.
- Describe best principles for collecting, storing, and maintaining data.
- Organize data in folders.
- Load data into R using a project
- Write code that allows for readers to understand, replicate, and extend your work.
- Researchers can build trust with the public and other scientists by making the entire scientific process transparent.
- Scientists can build upon their own work and that of others without having to dig through old notes or rely on the memory of a busy colleague to reconstruct past analyses.
- Others can assess how sensitive a result is to specific assumptions and decisions.
In my roles as a biostatistics professor and Data Editor at The American Naturalist, I have found that the greatest beneficiary of reproducible research is often the lead author themselves. In this chapter, we will work through the process of creating reproducible research—from collecting data in the field to writing and sharing R scripts that document your analyses.
The best time to make your research reproducible is while planning your project. The second best time is now.
4.1 Making science reproducible
This chapter walks you through the key steps for making your science reproducible—from field notes to final scripts. You will learn how to:
- Appropriately collect and store data, including Making rules for data collection, Making a spreadsheet for data entry, Making data (field) sheets, A checklist for data collection, Making a README and/or data dictionary, and Long term public data storage.
- Develop reproducible computational strategies, including: Making an R project, Loading data into R, Writing and saving R scripts (with comments), Cleaning and tidying data, and finally a reproducible code checklist (modified from The American Naturalist).
Then we summarize the chapter, present practice questions, a glossary, a review of R functions and R packages introduced, and present additional resources.