background-image: url("data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/welcome_to_rstats_twitter.png") background-position: 50% 0% background-size: 60% class: bottom ## Reproducible reports with RMarkdown [**Shilaan Alzahawi**](http://shilaan.rbind.io) @ Stanford Data Science Institute Artwork by [@allison_horst](https://github.com/allisonhorst/stats-illustrations) --- ### Do your data sci like it's going to need an alibi <img src="data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/reproducibility_court.png" width="100%" style="display: block; margin: auto;" /> Slides at [bit.ly/shilaan-rmd](https://bit.ly/shilaan-rmd) Artwork by [@allison_horst](https://github.com/allisonhorst/stats-illustrations) --- # Outline -- **Icebreakers** 🧊🔨 Week 2 of the DSSG! -- **What?** 📝 Dynamic and Reproducible Scientific Reports -- **Why?** ✅ Benefits -- **How?** 🛠 Tutorial --- background-image: url("data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/master/make-your-own-stats-cartoons/ex_3.png") background-position: 100% 90% background-size: 38% # Icebreakers 🧊🔨 -- Choose one of the following questions to share with your breakout partner: -- 🌍️ What is the craziest *small-world* experience you've had? -- 🦹🏾️ What would your superpower be and why? -- 😳 What's your guilty pleasure? -- 😭 What movie/show/scene/song always makes you cry? -- 🙈 What was your most embarrassing childhood fashion trend? -- After sharing your answers with each other, ... you'll come back and tell us what your partner told you! --- # The typical workflow When writing a scientific report, the typical workflow is to ... -- 1. Do your analyses (e.g., in `R` or `Python`) -- 2. Copy-paste or otherwise save your graphs and results -- 3. Open a program (e.g., `Microsoft Word`) to communicate the results -- 4. Manually format your results and citations -- ### Discussion questions -- What are common challenges when working in this fashion? What kind of problems could arise? ??? This is inefficient; it makes updating and maintaining the outputs difficult (when the data changes, steps 1 to 3 will have to be done again) and there is an overhead involved in jumping between incompatible computing environments. - Inefficient - Updating and maintaining your report is difficult - When the underlying data changes or you run different analyses, you have to repeat all steps in your workflow - You have to jump between different computing/software environments - Error prone (transcription errors, typos) - Lacks transparency for people who may want to reproduce your work --- class:center, middle <iframe width="1120" height="630" src="https://www.youtube.com/embed/s3JldKoA0zw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- <img src="data:image/png;base64,#http://swcarpentry.github.io/git-novice/fig/phd101212s.png" width="60%" style="display: block; margin: auto;" /> --- # Typical workflow challenges -- - Time-consuming -- - Error-prone (e.g., rounding or transcription errors) -- - Lacks transparency; difficult to reproduce (by others **and** yourself!) -- - Difficult to maintain and update (endless rewriting and reformatting...) -- - Overhead costs of different computing/software environments -- - **Anything else...?** --- background-image: url("data:image/png;base64,#https://upload.wikimedia.org/wikipedia/en/f/ff/SuccessKid.jpg") background-position: 50% 92% background-size: 45% ## An alternative workflow: What? -- - Fuse your code and writing -- - Directly embed results in your report -- - Automatically reflect analytic changes in your documentation -- - Update all your results, figures, and tables automatically -- - Automatic formatting (including citations!) --- background-image: url("data:image/png;base64,#https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/data_cowboy.png") background-position: 90% 40% background-size: 50% ## An alternative workflow: Why? -- Less... -- ⬇️ Error-prone -- ⬇️ Time-consuming -- More... -- ⬆️ Dynamic -- ⬆️ Reproducible -- ⬆️ Transparent --- background-image: url("data:image/png;base64,#https://bookdown.org/yihui/rmarkdown/images/hex-rmarkdown.png") background-position: 50% 90% background-size: 20% ## Our weapon of choice: RMarkdown -- - RMarkdown is an **authoring framework for data science**, designed for reproducibility -- - The same document holds the code and the narrative surrounding the data -- - Results are automatically generated from the code -- - You can use a single R Markdown file to ✓ save and execute code, and ✓ generate high quality reports that can be shared with an audience --- <img src="data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/rmarkdown_rockstar.png" width="80%" style="display: block; margin: auto;" /> Artwork by [@allison_horst](https://github.com/allisonhorst/stats-illustrations): **Get your code, text, and outputs in the same (reproducible) place** --- ## Introduction to RMarkdown -- - Create dynamic analysis documents that combine code, output (incl. figures and tables), and writing -- - Can be used to ✓ Reproduce your analyses ✓ Collaborate and share code with others ✓ Communicate your results with others -- - Output formats include HTML, PDF, Word and... 🤩 Slide shows 🤩 Websites ([shilaan.rbind.io](http://shilaan.rbind.io)) 🤩 Blogs 🤩 Books 🤩 Dashboards 🤩 Manuscripts 🤩 Interactive documents -- **Bonus question** How did I create my slides...? --- background-image: url("data:image/png;base64,#images/manuscript.png") background-position: 50% 80% background-size: contain ## Sneak peek: the power of RMarkdown --- background-image: url("data:image/png;base64,#images/cites.png") background-position: 50% 70% background-size: contain ## Sneak peek: the power of RMarkdown --- background-image: url("data:image/png;base64,#images/refs.png") background-position: 50% 50% background-size: contain ## Sneak peek: the power of RMarkdown --- ## Discussion question #### Are there good reasons for **not** using **RMarkdown**? -- .pull-left[ <img src="data:image/png;base64,#images/not-rmd.png" width="100%" /> ] -- .pull-right[ <img src="data:image/png;base64,#images/not-rmd2.png" width="100%" /> ] --- class: inverse, center, middle # Getting Started with RMarkdown --- # Getting started with RMarkdown - Install [`R`](https://cran.r-project.org/mirrors.html) - Install [`RStudio`](https://www.rstudio.com/products/rstudio/download/) - Install the `RMarkdown` package - Install `\(\LaTeX\)` (e.g., `TinyTex`) ```r install.packages("rmarkdown") install.packages('tinytex') # for generating PDF output tinytex::install_tinytex() # install TinyTeX ``` <img src="data:image/png;base64,#https://shilaan.rbind.io/post/building-your-website-using-r-blogdown/excited.jpg" width="70%" style="display: block; margin: auto;" /> --- ## Opening a new R Markdown - Create a new R Markdown document from the menu `File -> New File -> R Markdown` <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/new-rmarkdown.gif" width="90%" style="display: block; margin: auto;" /> --- ## Notebook interface - Allows for direct interaction with R (execute code and display results inline) - Makes it easy to test and iterate - Produces a reproducible document with publication-quality output <img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/07a00dd9669405f3cba06ef333db180295466252/7b153/lesson-images/how-2-chunk.png" width="90%" style="display: block; margin: auto;" /> --- ## Three types of content - YAML meta-data / frontmatter (between `---` and `---`) - Text with Markdown formatting - R code <img src="data:image/png;base64,#images/rmarkdown.png" width="95%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Metadata --- background-image: url("data:image/png;base64,#https://cwextensions.com/images/logo-someta.png") background-position: 92% 7% background-size: 10% # YAML metadata The YAML header contains basic metadata and rendering instructions ```yaml --- title: My R Markdown Report author: Shilaan Alzahawi output: pdf_document date: "2021-07-10" --- ``` -- The date will be **dynamically updated** every time we knit the report, with the help of the following line of code (more on **in-line code** later): -- <img src="data:image/png;base64,#images/date.png" width="1037" style="display: block; margin: auto auto auto 0;" /> --- # Preview an RMarkdown <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/preview.gif" width="100%" style="display: block; margin: auto;" /> --- # Rendering a document ✓ ![knit](data:image/png;base64,#images/knit.png) ✓ Windows/Linux: `Control + Shift + K` ✓ OS X: `Command + Shift + K` <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/knitting.gif" width="85%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/rmarkdown_wizards.png" width="100%" style="display: block; margin: auto;" /> Artwork by [@allison_horst](https://github.com/allisonhorst/stats-illustrations): **Become an RMarkdown knitting wizard** --- ## Output formats ![](data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/ece57b678854545e6602a23daede51ad72da2170/21cca/lesson-images/outputs-1-word.png) --- ## Output formats ![](data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/ebcf2beeb67cc21693d73b8708d5af0fa9769f57/de9a9/lesson-images/outputs-2-pdf.png) --- ## What's happening behind the scenes? ![A diagram illustrating how an R Markdown document is converted to the final output document](data:image/png;base64,#https://bookdown.org/yihui/rmarkdown-cookbook/images/workflow.png) ☞ The code within the `.Rmd` file is executed and converted into an `.md` file; ☞ The `.md` file is converted to the output format specified in the metadata --- ## What's happening behind the scenes? Knitting an `RMarkdown` file... -- 1. Starts a new R session ✓ No packages or objects loaded -- 2. Sets your working directory to the location of the `RMarkdown` file -- 3. Executes all code chunks from top to bottom -- ### ⚠️ **Make sure to load all R packages you use!** <img src="data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/master/make-your-own-stats-cartoons/ex_4.png" width="50%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Code --- ## Two types of code in RMarkdown 1. A code chunk, surrounded by three backticks and `{r}` 2. An inline code expression, surrounded by one backtick and `r` <img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/4c3760f9341ec07761c95fb5f03e033fa73d206d/057ff/lesson-images/inline-1-heat.png" width="95%" style="display: block; margin: auto auto auto 0;" /> --- ## Code chunks -- "*Code chunks are the beating heart of our R Markdown.*" [Xie, Dervieux, Riederer 2021](https://bookdown.org/yihui/rmarkdown-cookbook/rmarkdown-anatomy.html) -- ```r summary(Orange) ``` ``` ## Tree age circumference ## 3:7 Min. : 118.0 Min. : 30.0 ## 1:7 1st Qu.: 484.0 1st Qu.: 65.5 ## 5:7 Median :1004.0 Median :115.0 ## 2:7 Mean : 922.1 Mean :115.9 ## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5 ## Max. :1582.0 Max. :214.0 ``` -- ### Inserting a code chunk -- ✓ Windows/Linux: `Control + Alt + I` -- ✓ OS X: `Command + Option + I` -- ✓ Enclosing code with three backticks and `{r}` -- ✓ ![](data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/b8b19518e688e3ca1390e0a1588916f04908d33f/8a4dc/images/notebook-insert-chunk.png) --- ## Inserting code chunks <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/insert-rchunk.gif" width="95%" style="display: block; margin: auto;" /> --- ## Chunk anatomy <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/chunk-anatomy-2.gif" width="80%" style="display: block; margin: auto;" /> --- ## Naming your code chunks It's recommended to name your chunks. This allows you to quickly navigate code, automatically name figures, and troubleshoot errors. <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/chunk-names.gif" width="80%" style="display: block; margin: auto;" /> --- ## Chunk options Control a chunk's behavior by passing additional, comma-separated arguments -- ✓ `echo = TRUE` show code and output (*default*) -- ✓ `echo = FALSE` show output only (hide code) -- ✓ `include = FALSE` do not show output (run code) -- ✓ `eval = FALSE` show code (do not run; no output) -- ✓ `warning = FALSE` removes warning messages -- ✓ `error = FALSE` removes error messages -- ✓ `message = FALSE` removes all messages -- ```r summary(Orange) ``` -- **Bonus question:** What chunk option did I set here? --- ## Chunk options <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/chunk-options.gif" width="100%" style="display: block; margin: auto;" /> Credit for all GIFs goes to [Shannon Pileggi](https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/#rmd) --- ## Chunk execution `Ctrl + Enter` or `Command + Enter` or press ![](data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/img/run-chunk.PNG) <img src="data:image/png;base64,#https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/gifs/run-chunk.gif" width="100%" style="display: block; margin: auto;" /> --- ## In-line code To insert in-line code, wrap your code in a single backtick. RMarkdown will always - display the results of inline code, but not the code - apply relevant text formatting to the results -- **R Markdown document** <img src="data:image/png;base64,#images/inline.png" width="1867" /> -- **Knitted HTML document** <img src="data:image/png;base64,#images/inline-knitted.png" width="1667" /> --- class: inverse, center, middle # Text --- # Markdown formatting basics ![](data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/59f29676ef5e4d74685e14f801bbc10c2dbd3cef/c0688/lesson-images/markdown-1-markup.png) --- ![](data:image/png;base64,#images/syntax-becomes.png) For more formatting options, see the [R Markdown Reference guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf?_ga=2.157796986.1542626288.1625161001-1806201684.1624641897) --- --- ### Harnessing the power of meta-data, code, and text -- **R Markdown document** <img src="data:image/png;base64,#images/harness.png" width="2392" /> -- **Knitted academic manuscript** <img src="data:image/png;base64,#images/harness-knit.png" width="1727" /> --- ## Tables ![](data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/09467251a219c3c6b2dae2bf1367e5736a9ef78c/feeea/lesson-images/tables-1-kable.png)<!-- --> --- ## Table example (APA) -- <img src="data:image/png;base64,#images/table.png" width="2249" /> -- <img src="data:image/png;base64,#images/table-knit.png" width="50%" style="display: block; margin: auto;" /> --- background-image: url("data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/master/make-your-own-stats-cartoons/ex_1.png") background-position: 50% 90% background-size: 50% ## Statistical output example (APA) -- <img src="data:image/png;base64,#images/statistics.png" width="2296" /> -- <img src="data:image/png;base64,#images/statistics-knit.png" width="50%" /> --- ## Tips and tricks -- 📦 Load all R packages in the first code chunk -- ⚠️ Do not include `install.packages()` or `setwd()` -- ![](data:image/png;base64,#images/spell.png) RMarkdown checks your spelling! -- ⛑ `File > Help > Cheatsheets > R Markdown Cheat Sheet` -- 💨 `File > Help > Markdown Quick Reference` -- ### Resources - [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) - [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/) --- class: right <img src="data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/blob/master/rstats-artwork/r_first_then.png?raw=true" width="65%" style="display: block; margin: auto;" /> Artwork by [@allison_horst](https://github.com/allisonhorst/stats-illustrations) --- class: center, middle # Thank you! Slides created with the R package [**xaringan**](https://github.com/yihui/xaringan).