Environmental Data Analysis and Visualization
Programming:
Version control and collaboration:
By the end of the course, you will be able to…
gain insight from data
gain insight from data, reproducibly
gain insight from data, reproducibly, using modern programming tools and techniques
gain insight from data, reproducibly and collaboratively, using modern programming tools and techniques
gain insight from data, reproducibly (with literate programming and version control) and collaboratively, using modern programming tools and techniques
Question:
What does it mean for a data analysis to be “reproducible”?
Near-term goals:
Long-term goals:
Scriptability ➡️ R
Literate programming (code, narrative, output in one place) ➡️ R Markdown
Version control ➡️ Git / GitHub
Packages: fundamental units of reproducible R code. They include R functions, the documentation that describes how to use them, and sample data1
As of January 2025, there are almost 22,000 R packages available on CRAN (the Comprehensive R Archive Network)2
We’re going to work with a small (but important) subset of these
Functions are (most often) verbs, followed by what they will be applied to in parentheses:
Packages are installed with the install.packages
function and loaded with the library
function, once per session:
Columns (variables) in data frames are accessed with $
:
Function documentation can be accessed with ?
.qmd
extension), e.g. “Do this in your quarto document” and rarely discuss loading the quarto packageNote
The environment of your quarto document is separate from the console!
Remember this, and expect it to bite you a few times as you’re learning to work with quarto!
When you render a document, quarto “forgets” what you already have saved in your environment and re-runs your entire document from scratch. Everything you need to run your code needs to be in your quarto doc in order for it to render correctly.
You didn’t assign x <- 2
as an object in your quarto document!
Rendering reruns everything from scratch, so it couldn’t find an object called x
in the environment.
Every assignment / report / project / etc. is a Quarto document
You’ll always have a template Quarto document to start with
The amount of scaffolding in the template will decrease over the semester
Mitchell O’Hara-Wild, useR! 2018 feature wall
Programming:
Version control and collaboration:
GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better
We will use GitHub as a platform for web hosting and collaboration (and as our course management system)
There are a lot of git commands. 99% of the time you will use git to add, commit, push, and pull.
If you google for git help you might come across methods using the command line/terminal – skip that and move on to the next resource unless you feel comfortable trying it out.
There is a great resource for working with git and R: happygitwithr.com. Some of the content in there is beyond the scope of this course, but it’s a good place to look for help.
Work with R, RStudio, Git, and GitHub together
In lab…
Work with R, RStudio, Git, and GitHub together!