Meet the toolkit



Environmental Data Analysis and Visualization

Course toolkit

Doing data science

Programming:

  • R
  • RStudio
  • tidyverse
  • R Markdown

Version control and collaboration:

  • Git
  • GitHub

Learning goals

By the end of the course, you will be able to…

  • gain insight from data

  • gain insight from data, reproducibly

  • gain insight from data, reproducibly, using modern programming tools and techniques

  • gain insight from data, reproducibly and collaboratively, using modern programming tools and techniques

  • gain insight from data, reproducibly (with literate programming and version control) and collaboratively, using modern programming tools and techniques

Reproducible data analysis

Reproducibility checklist

Question:

What does it mean for a data analysis to be “reproducible”?

Reproducibility checklist

Near-term goals:

  • Are the tables and figures reproducible from the code and data?
  • Does the code actually do what you think it does?
  • In addition to what was done, is it clear why it was done?

Long-term goals:

  • Can the code be used for other data?
  • Can you extend the code to do other things?

Toolkit for reproducibility

  • Scriptability ➡️ R

  • Literate programming (code, narrative, output in one place) ➡️ R Markdown

  • Version control ➡️ Git / GitHub

R and RStudio

  • R: open-source statistical programming language
  • Environment for statistical computing and graphics
  • Easily extensible with packages
  • The engine that runs code

  • RStudio: convenient interface for R called an IDE (integrated development environment)
  • Not a requirement for programming with R, very commonly used
  • The car that houses the engine

R packages

  • Packages: fundamental units of reproducible R code. They include R functions, the documentation that describes how to use them, and sample data1

  • As of January 2025, there are almost 22,000 R packages available on CRAN (the Comprehensive R Archive Network)2

  • We’re going to work with a small (but important) subset of these

Tour: R and RStudio

A short list (for now) of R essentials: functions


Functions are (most often) verbs, followed by what they will be applied to in parentheses:

do_this(to_this)
do_that(to_this, to_that, with_those)

A short list (for now) of R essentials: packages


Packages are installed with the install.packages function and loaded with the library function, once per session:

install.packages("package_name")
library(package_name)

R essentials (continued)


Columns (variables) in data frames are accessed with $:

dataframe$var_name

R essentials (continued)


Function documentation can be accessed with ?

?mean

tidyverse

tidyverse.org

  • The tidyverse is an opinionated collection of R packages designed for data science
  • All packages share an underlying philosophy and a common grammar

Quarto

Quarto

https://quarto.org/

  • Quarto and the various packages that support it enable R users to write their code and prose in reproducible computational documents
  • We will generally refer to quarto documents (with .qmd extension), e.g. “Do this in your quarto document” and rarely discuss loading the quarto package

Quarto

  • Fully reproducible reports – each time you render the analysis is ran from the beginning
  • Simple markdown syntax for text
  • Code goes in chunks, defined by three backticks, narrative goes outside of chunks

Tour: Quarto

Environments

Note

The environment of your quarto document is separate from the console!

  • Remember this, and expect it to bite you a few times as you’re learning to work with quarto!

  • When you render a document, quarto “forgets” what you already have saved in your environment and re-runs your entire document from scratch. Everything you need to run your code needs to be in your quarto doc in order for it to render correctly.

Environments

First, run the following in the console

x <- 2
x * 3

Looks good?

Then, add the following in an R chunk in a blank quarto document and try to render it.

x * 3

What happens? Why the error?

Environments

You didn’t assign x <- 2 as an object in your quarto document!

Rendering reruns everything from scratch, so it couldn’t find an object called x in the environment.

Quarto help

How will we use Quarto?

  • Every assignment / report / project / etc. is a Quarto document

  • You’ll always have a template Quarto document to start with

  • The amount of scaffolding in the template will decrease over the semester

What’s with all the hexes?

Mitchell O’Hara-Wild, useR! 2018 feature wall

Course toolkit

Doing data science

Programming:

  • R
  • RStudio
  • tidyverse
  • R Markdown

Version control and collaboration:

  • Git
  • GitHub

Git and GitHub

Git and GitHub

  • Git is a version control system – like “Track Changes” features from Microsoft Word, on steroids
  • It’s not the only version control system, but it’s a very popular one

  • GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better

  • We will use GitHub as a platform for web hosting and collaboration (and as our course management system)

Versioning

Versioning

with human readable messages

Why do we need version control?

How will we use Git and GitHub?

How will we use Git and GitHub?

How will we use Git and GitHub?

How will we use Git and GitHub?

Git and GitHub tips

  • There are a lot of git commands. 99% of the time you will use git to add, commit, push, and pull.

  • If you google for git help you might come across methods using the command line/terminal – skip that and move on to the next resource unless you feel comfortable trying it out.

  • There is a great resource for working with git and R: happygitwithr.com. Some of the content in there is beyond the scope of this course, but it’s a good place to look for help.

In lab…

Work with R, RStudio, Git, and GitHub together

First create a Github account

  • Go to github.com and create an account
  • Verify your GitHub email
  • Adjust your GitHub settings for a more pleasant GitHub experience
    • Settings > Emails > Uncheck “Keep my email address private”
    • Settings > Emails > Update name and photo
  • Add your GitHub username to this spreadsheet

In lab…

Work with R, RStudio, Git, and GitHub together!