+ - 0:00:00
Notes for current slide
Notes for next slide

Reproducible reports with RMarkdown

Shilaan Alzahawi @ Stanford Data Science Institute
Artwork by @allison_horst

1 / 52

Do your data sci like it's going to need an alibi

Slides at bit.ly/shilaan-rmd
Artwork by @allison_horst

2 / 52

Outline

3 / 52

Outline

Icebreakers

🧊🔨 Week 2 of the DSSG!

3 / 52

Outline

Icebreakers

🧊🔨 Week 2 of the DSSG!

What?

📝 Dynamic and Reproducible Scientific Reports

3 / 52

Outline

Icebreakers

🧊🔨 Week 2 of the DSSG!

What?

📝 Dynamic and Reproducible Scientific Reports

Why?

✅ Benefits

3 / 52

Outline

Icebreakers

🧊🔨 Week 2 of the DSSG!

What?

📝 Dynamic and Reproducible Scientific Reports

Why?

✅ Benefits

How?

🛠 Tutorial

3 / 52

Icebreakers 🧊🔨

4 / 52

Icebreakers 🧊🔨

Choose one of the following questions to share with your breakout partner:

4 / 52

Icebreakers 🧊🔨

Choose one of the following questions to share with your breakout partner:

🌍️ What is the craziest small-world experience you've had?

4 / 52

Icebreakers 🧊🔨

Choose one of the following questions to share with your breakout partner:

🌍️ What is the craziest small-world experience you've had?

🦹🏾️ What would your superpower be and why?

4 / 52

Icebreakers 🧊🔨

Choose one of the following questions to share with your breakout partner:

🌍️ What is the craziest small-world experience you've had?

🦹🏾️ What would your superpower be and why?

😳 What's your guilty pleasure?

4 / 52

Icebreakers 🧊🔨

Choose one of the following questions to share with your breakout partner:

🌍️ What is the craziest small-world experience you've had?

🦹🏾️ What would your superpower be and why?

😳 What's your guilty pleasure?

😭 What movie/show/scene/song always makes you cry?

4 / 52

Icebreakers 🧊🔨

Choose one of the following questions to share with your breakout partner:

🌍️ What is the craziest small-world experience you've had?

🦹🏾️ What would your superpower be and why?

😳 What's your guilty pleasure?

😭 What movie/show/scene/song always makes you cry?

🙈 What was your most embarrassing childhood fashion trend?

4 / 52

Icebreakers 🧊🔨

Choose one of the following questions to share with your breakout partner:

🌍️ What is the craziest small-world experience you've had?

🦹🏾️ What would your superpower be and why?

😳 What's your guilty pleasure?

😭 What movie/show/scene/song always makes you cry?

🙈 What was your most embarrassing childhood fashion trend?

After sharing your answers with each other,

... you'll come back and tell us what your partner told you!

4 / 52

The typical workflow

When writing a scientific report, the typical workflow is to ...

5 / 52

The typical workflow

When writing a scientific report, the typical workflow is to ...

  1. Do your analyses (e.g., in R or Python)
5 / 52

The typical workflow

When writing a scientific report, the typical workflow is to ...

  1. Do your analyses (e.g., in R or Python)

  2. Copy-paste or otherwise save your graphs and results

5 / 52

The typical workflow

When writing a scientific report, the typical workflow is to ...

  1. Do your analyses (e.g., in R or Python)

  2. Copy-paste or otherwise save your graphs and results

  3. Open a program (e.g., Microsoft Word) to communicate the results

5 / 52

The typical workflow

When writing a scientific report, the typical workflow is to ...

  1. Do your analyses (e.g., in R or Python)

  2. Copy-paste or otherwise save your graphs and results

  3. Open a program (e.g., Microsoft Word) to communicate the results

  4. Manually format your results and citations

5 / 52

The typical workflow

When writing a scientific report, the typical workflow is to ...

  1. Do your analyses (e.g., in R or Python)

  2. Copy-paste or otherwise save your graphs and results

  3. Open a program (e.g., Microsoft Word) to communicate the results

  4. Manually format your results and citations

Discussion questions

5 / 52

The typical workflow

When writing a scientific report, the typical workflow is to ...

  1. Do your analyses (e.g., in R or Python)

  2. Copy-paste or otherwise save your graphs and results

  3. Open a program (e.g., Microsoft Word) to communicate the results

  4. Manually format your results and citations

Discussion questions

What are common challenges when working in this fashion?
What kind of problems could arise?

5 / 52

This is inefficient; it makes updating and maintaining the outputs difficult (when the data changes, steps 1 to 3 will have to be done again) and there is an overhead involved in jumping between incompatible computing environments.

  • Inefficient
  • Updating and maintaining your report is difficult
  • When the underlying data changes or you run different analyses, you have to repeat all steps in your workflow
  • You have to jump between different computing/software environments
  • Error prone (transcription errors, typos)
  • Lacks transparency for people who may want to reproduce your work
6 / 52

7 / 52

Typical workflow challenges

8 / 52

Typical workflow challenges

  • Time-consuming
8 / 52

Typical workflow challenges

  • Time-consuming

  • Error-prone (e.g., rounding or transcription errors)

8 / 52

Typical workflow challenges

  • Time-consuming

  • Error-prone (e.g., rounding or transcription errors)

  • Lacks transparency; difficult to reproduce (by others and yourself!)

8 / 52

Typical workflow challenges

  • Time-consuming

  • Error-prone (e.g., rounding or transcription errors)

  • Lacks transparency; difficult to reproduce (by others and yourself!)

  • Difficult to maintain and update (endless rewriting and reformatting...)

8 / 52

Typical workflow challenges

  • Time-consuming

  • Error-prone (e.g., rounding or transcription errors)

  • Lacks transparency; difficult to reproduce (by others and yourself!)

  • Difficult to maintain and update (endless rewriting and reformatting...)

  • Overhead costs of different computing/software environments

8 / 52

Typical workflow challenges

  • Time-consuming

  • Error-prone (e.g., rounding or transcription errors)

  • Lacks transparency; difficult to reproduce (by others and yourself!)

  • Difficult to maintain and update (endless rewriting and reformatting...)

  • Overhead costs of different computing/software environments

  • Anything else...?

8 / 52

An alternative workflow: What?

9 / 52

An alternative workflow: What?

  • Fuse your code and writing
9 / 52

An alternative workflow: What?

  • Fuse your code and writing

  • Directly embed results in your report

9 / 52

An alternative workflow: What?

  • Fuse your code and writing

  • Directly embed results in your report

  • Automatically reflect analytic changes in your documentation

9 / 52

An alternative workflow: What?

  • Fuse your code and writing

  • Directly embed results in your report

  • Automatically reflect analytic changes in your documentation

  • Update all your results, figures, and tables automatically

9 / 52

An alternative workflow: What?

  • Fuse your code and writing

  • Directly embed results in your report

  • Automatically reflect analytic changes in your documentation

  • Update all your results, figures, and tables automatically

  • Automatic formatting (including citations!)

9 / 52

An alternative workflow: Why?

10 / 52

An alternative workflow: Why?

Less...

10 / 52

An alternative workflow: Why?

Less...

⬇️ Error-prone

10 / 52

An alternative workflow: Why?

Less...

⬇️ Error-prone

⬇️ Time-consuming

10 / 52

An alternative workflow: Why?

Less...

⬇️ Error-prone

⬇️ Time-consuming

More...

10 / 52

An alternative workflow: Why?

Less...

⬇️ Error-prone

⬇️ Time-consuming

More...

⬆️ Dynamic

10 / 52

An alternative workflow: Why?

Less...

⬇️ Error-prone

⬇️ Time-consuming

More...

⬆️ Dynamic

⬆️ Reproducible

10 / 52

An alternative workflow: Why?

Less...

⬇️ Error-prone

⬇️ Time-consuming

More...

⬆️ Dynamic

⬆️ Reproducible

⬆️ Transparent

10 / 52

Our weapon of choice: RMarkdown

11 / 52

Our weapon of choice: RMarkdown

  • RMarkdown is an authoring framework for data science, designed for reproducibility
11 / 52

Our weapon of choice: RMarkdown

  • RMarkdown is an authoring framework for data science, designed for reproducibility

  • The same document holds the code and the narrative surrounding the data

11 / 52

Our weapon of choice: RMarkdown

  • RMarkdown is an authoring framework for data science, designed for reproducibility

  • The same document holds the code and the narrative surrounding the data

  • Results are automatically generated from the code

11 / 52

Our weapon of choice: RMarkdown

  • RMarkdown is an authoring framework for data science, designed for reproducibility

  • The same document holds the code and the narrative surrounding the data

  • Results are automatically generated from the code

  • You can use a single R Markdown file to
    ✓ save and execute code, and
    ✓ generate high quality reports that can be shared with an audience

11 / 52

Artwork by @allison_horst:
Get your code, text, and outputs in the same (reproducible) place

12 / 52

Introduction to RMarkdown

13 / 52

Introduction to RMarkdown

  • Create dynamic analysis documents that combine code, output (incl. figures and tables), and writing
13 / 52

Introduction to RMarkdown

  • Create dynamic analysis documents that combine code, output (incl. figures and tables), and writing

  • Can be used to
    ✓ Reproduce your analyses
    ✓ Collaborate and share code with others
    ✓ Communicate your results with others

13 / 52

Introduction to RMarkdown

  • Create dynamic analysis documents that combine code, output (incl. figures and tables), and writing

  • Can be used to
    ✓ Reproduce your analyses
    ✓ Collaborate and share code with others
    ✓ Communicate your results with others

  • Output formats include HTML, PDF, Word and...
    🤩 Slide shows
    🤩 Websites (shilaan.rbind.io)
    🤩 Blogs
    🤩 Books
    🤩 Dashboards
    🤩 Manuscripts
    🤩 Interactive documents

13 / 52

Introduction to RMarkdown

  • Create dynamic analysis documents that combine code, output (incl. figures and tables), and writing

  • Can be used to
    ✓ Reproduce your analyses
    ✓ Collaborate and share code with others
    ✓ Communicate your results with others

  • Output formats include HTML, PDF, Word and...
    🤩 Slide shows
    🤩 Websites (shilaan.rbind.io)
    🤩 Blogs
    🤩 Books
    🤩 Dashboards
    🤩 Manuscripts
    🤩 Interactive documents

Bonus question How did I create my slides...?

13 / 52

Sneak peek: the power of RMarkdown

14 / 52

Sneak peek: the power of RMarkdown

15 / 52

Sneak peek: the power of RMarkdown

16 / 52

Discussion question

Are there good reasons for not using RMarkdown?

17 / 52

Discussion question

Are there good reasons for not using RMarkdown?

17 / 52

Discussion question

Are there good reasons for not using RMarkdown?

17 / 52

Getting Started with RMarkdown

18 / 52

Getting started with RMarkdown

  • Install R
  • Install RStudio
  • Install the RMarkdown package
  • Install LATEX (e.g., TinyTex)
install.packages("rmarkdown")
install.packages('tinytex') # for generating PDF output
tinytex::install_tinytex() # install TinyTeX

19 / 52

Opening a new R Markdown

  • Create a new R Markdown document from the menu
    File -> New File -> R Markdown

20 / 52

Notebook interface

  • Allows for direct interaction with R (execute code and display results inline)
  • Makes it easy to test and iterate
  • Produces a reproducible document with publication-quality output

21 / 52

Three types of content

  • YAML meta-data / frontmatter (between --- and ---)
  • Text with Markdown formatting
  • R code

22 / 52

Metadata

23 / 52

YAML metadata

The YAML header contains basic metadata and rendering instructions

---
title: My R Markdown Report
author: Shilaan Alzahawi
output: pdf_document
date: "2021-07-10"
---
24 / 52

YAML metadata

The YAML header contains basic metadata and rendering instructions

---
title: My R Markdown Report
author: Shilaan Alzahawi
output: pdf_document
date: "2021-07-10"
---

The date will be dynamically updated every time we knit the report,
with the help of the following line of code (more on in-line code later):

24 / 52

YAML metadata

The YAML header contains basic metadata and rendering instructions

---
title: My R Markdown Report
author: Shilaan Alzahawi
output: pdf_document
date: "2021-07-10"
---

The date will be dynamically updated every time we knit the report,
with the help of the following line of code (more on in-line code later):

24 / 52

Preview an RMarkdown

25 / 52

Rendering a document

knit
✓ Windows/Linux: Control + Shift + K
✓ OS X: Command + Shift + K

26 / 52

Artwork by @allison_horst:
Become an RMarkdown knitting wizard

27 / 52

Output formats

28 / 52

Output formats

29 / 52

What's happening behind the scenes?

A diagram illustrating how an R Markdown document is converted to the final output document

☞ The code within the .Rmd file is executed and converted into an .md file;
☞ The .md file is converted to the output format specified in the metadata

30 / 52

What's happening behind the scenes?

Knitting an RMarkdown file...

31 / 52

What's happening behind the scenes?

Knitting an RMarkdown file...

  1. Starts a new R session
    ✓ No packages or objects loaded
31 / 52

What's happening behind the scenes?

Knitting an RMarkdown file...

  1. Starts a new R session
    ✓ No packages or objects loaded

  2. Sets your working directory to the location of the RMarkdown file

31 / 52

What's happening behind the scenes?

Knitting an RMarkdown file...

  1. Starts a new R session
    ✓ No packages or objects loaded

  2. Sets your working directory to the location of the RMarkdown file

  3. Executes all code chunks from top to bottom

31 / 52

What's happening behind the scenes?

Knitting an RMarkdown file...

  1. Starts a new R session
    ✓ No packages or objects loaded

  2. Sets your working directory to the location of the RMarkdown file

  3. Executes all code chunks from top to bottom

⚠️ Make sure to load all R packages you use!

31 / 52

Code

32 / 52

Two types of code in RMarkdown

  1. A code chunk, surrounded by three backticks and {r}
  2. An inline code expression, surrounded by one backtick and r

33 / 52

Code chunks

34 / 52

Code chunks

"Code chunks are the beating heart of our R Markdown." Xie, Dervieux, Riederer 2021

34 / 52

Code chunks

"Code chunks are the beating heart of our R Markdown." Xie, Dervieux, Riederer 2021

summary(Orange)
## Tree age circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0
34 / 52

Code chunks

"Code chunks are the beating heart of our R Markdown." Xie, Dervieux, Riederer 2021

summary(Orange)
## Tree age circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0

Inserting a code chunk

34 / 52

Code chunks

"Code chunks are the beating heart of our R Markdown." Xie, Dervieux, Riederer 2021

summary(Orange)
## Tree age circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0

Inserting a code chunk

✓ Windows/Linux: Control + Alt + I

34 / 52

Code chunks

"Code chunks are the beating heart of our R Markdown." Xie, Dervieux, Riederer 2021

summary(Orange)
## Tree age circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0

Inserting a code chunk

✓ Windows/Linux: Control + Alt + I
✓ OS X: Command + Option + I

34 / 52

Code chunks

"Code chunks are the beating heart of our R Markdown." Xie, Dervieux, Riederer 2021

summary(Orange)
## Tree age circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0

Inserting a code chunk

✓ Windows/Linux: Control + Alt + I
✓ OS X: Command + Option + I
✓ Enclosing code with three backticks and {r}

34 / 52

Code chunks

"Code chunks are the beating heart of our R Markdown." Xie, Dervieux, Riederer 2021

summary(Orange)
## Tree age circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0

Inserting a code chunk

✓ Windows/Linux: Control + Alt + I
✓ OS X: Command + Option + I
✓ Enclosing code with three backticks and {r}

34 / 52

Inserting code chunks

35 / 52

Chunk anatomy

36 / 52

Naming your code chunks

It's recommended to name your chunks. This allows you to quickly navigate code, automatically name figures, and troubleshoot errors.

37 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

include = FALSE do not show output (run code)

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

include = FALSE do not show output (run code)

eval = FALSE show code (do not run; no output)

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

include = FALSE do not show output (run code)

eval = FALSE show code (do not run; no output)

warning = FALSE removes warning messages

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

include = FALSE do not show output (run code)

eval = FALSE show code (do not run; no output)

warning = FALSE removes warning messages

error = FALSE removes error messages

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

include = FALSE do not show output (run code)

eval = FALSE show code (do not run; no output)

warning = FALSE removes warning messages

error = FALSE removes error messages

message = FALSE removes all messages

38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

include = FALSE do not show output (run code)

eval = FALSE show code (do not run; no output)

warning = FALSE removes warning messages

error = FALSE removes error messages

message = FALSE removes all messages

summary(Orange)
38 / 52

Chunk options

Control a chunk's behavior by passing additional, comma-separated arguments

echo = TRUE show code and output (default)

echo = FALSE show output only (hide code)

include = FALSE do not show output (run code)

eval = FALSE show code (do not run; no output)

warning = FALSE removes warning messages

error = FALSE removes error messages

message = FALSE removes all messages

summary(Orange)

Bonus question: What chunk option did I set here?

38 / 52

Chunk options

Credit for all GIFs goes to Shannon Pileggi

39 / 52

Chunk execution

Ctrl + Enter or Command + Enter or press

40 / 52

In-line code

To insert in-line code, wrap your code in a single backtick. RMarkdown will always

  • display the results of inline code, but not the code
  • apply relevant text formatting to the results
41 / 52

In-line code

To insert in-line code, wrap your code in a single backtick. RMarkdown will always

  • display the results of inline code, but not the code
  • apply relevant text formatting to the results

R Markdown document

41 / 52

In-line code

To insert in-line code, wrap your code in a single backtick. RMarkdown will always

  • display the results of inline code, but not the code
  • apply relevant text formatting to the results

R Markdown document
Knitted HTML document

41 / 52

Text

42 / 52

Markdown formatting basics

43 / 52

For more formatting options, see the R Markdown Reference guide

44 / 52
45 / 52

Harnessing the power of meta-data, code, and text

46 / 52

Harnessing the power of meta-data, code, and text

R Markdown document

46 / 52

Harnessing the power of meta-data, code, and text

R Markdown document
Knitted academic manuscript

46 / 52

Tables

47 / 52

Table example (APA)

48 / 52

Table example (APA)

48 / 52

Table example (APA)

48 / 52

Statistical output example (APA)

49 / 52

Statistical output example (APA)

49 / 52

Statistical output example (APA)

49 / 52

Tips and tricks

50 / 52

Tips and tricks

📦 Load all R packages in the first code chunk

50 / 52

Tips and tricks

📦 Load all R packages in the first code chunk

⚠️ Do not include install.packages() or setwd()

50 / 52

Tips and tricks

📦 Load all R packages in the first code chunk

⚠️ Do not include install.packages() or setwd()

RMarkdown checks your spelling!

50 / 52

Tips and tricks

📦 Load all R packages in the first code chunk

⚠️ Do not include install.packages() or setwd()

RMarkdown checks your spelling!

File > Help > Cheatsheets > R Markdown Cheat Sheet

50 / 52

Tips and tricks

📦 Load all R packages in the first code chunk

⚠️ Do not include install.packages() or setwd()

RMarkdown checks your spelling!

File > Help > Cheatsheets > R Markdown Cheat Sheet

💨 File > Help > Markdown Quick Reference

50 / 52

Tips and tricks

📦 Load all R packages in the first code chunk

⚠️ Do not include install.packages() or setwd()

RMarkdown checks your spelling!

File > Help > Cheatsheets > R Markdown Cheat Sheet

💨 File > Help > Markdown Quick Reference

Resources

50 / 52

Artwork by @allison_horst

51 / 52

Thank you!

Slides created with the R package xaringan.

52 / 52

Do your data sci like it's going to need an alibi

Slides at bit.ly/shilaan-rmd
Artwork by @allison_horst

2 / 52
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow