Functions: An introduction



Environmental Data Analysis and Visualization

Intro to functions

Recall that you use functions all the time


The code that comes before parentheses is the name of a function.

ggplot(penguins...) +
  ...

ggplot() is a function.

Recall that you use functions all the time


The code that comes before parentheses is the name of a function.

mean(x)

mean() is a function.

Recall that you use functions all the time

  • When you load a package, it loads all of the functions associated with that package.

  • However, sometimes there isn’t already an existing function to do what we want.

  • When that’s the case, we can write our own custom function.

Why are custom functions useful?

We want to rescale the values in each column to have a range from 0 to 1

df <- tibble(
  a = rnorm(5),
  b = rnorm(5),
  c = rnorm(5),
  d = rnorm(5),
)

df
# A tibble: 5 × 4
       a      b      c      d
   <dbl>  <dbl>  <dbl>  <dbl>
1  1.04  -0.955  1.74  -1.46 
2 -1.06   0.818  0.499  0.336
3 -2.01  -1.04  -0.697 -0.610
4  0.323  0.924  0.243 -1.08 
5  0.774  0.407 -0.590 -0.737

Why are functions useful?

We want to rescale the values in each column to have a range from 0 to 1. Can you spot the mistake?

df |> mutate(
  a = (a - min(a, na.rm = TRUE)) / 
    (max(a, na.rm = TRUE) - min(a, na.rm = TRUE)),
  b = (b - min(a, na.rm = TRUE)) / 
    (max(b, na.rm = TRUE) - min(b, na.rm = TRUE)),
  c = (c - min(c, na.rm = TRUE)) / 
    (max(c, na.rm = TRUE) - min(c, na.rm = TRUE)),
  d = (d - min(d, na.rm = TRUE)) / 
    (max(d, na.rm = TRUE) - min(d, na.rm = TRUE)),
)
# A tibble: 5 × 4
      a      b      c     d
  <dbl>  <dbl>  <dbl> <dbl>
1 1     -0.485 1      0    
2 0.313  0.416 0.492  1    
3 0     -0.530 0      0.474
4 0.766  0.470 0.386  0.212
5 0.914  0.207 0.0439 0.403

Why are functions useful?

  • In this example, we are copying and pasting basically the same code over and over.

  • The only thing we are changing is the column name.

Why is this a bad idea?

Writing a function: First determine which parts of your code are constant and which parts change.


What are we changing each time we run our code to rescale each of the columns?

(a - min(a, na.rm = TRUE)) / (max(a, na.rm = TRUE) - min(a, na.rm = TRUE))
(b - min(b, na.rm = TRUE)) / (max(b, na.rm = TRUE) - min(b, na.rm = TRUE))
(c - min(c, na.rm = TRUE)) / (max(c, na.rm = TRUE) - min(c, na.rm = TRUE))
(d - min(d, na.rm = TRUE)) / (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))

Writing a function: First determine which parts of your code are constant and which parts change.


What are we changing each time we run our code to rescale the data?

To turn your code into a function you need three things:

  1. A name
  2. The arguments
  3. The body
name <- function(arguments) {
  body
}

To turn your code into a function you need three things:

  1. A name. Here we’ll use rescale01 because this function rescales the values in a vector to lie between 0 and 1.
rescale01 <- function() {
  
}

To turn your code into a function you need three things:

  1. A name. Here we’ll use rescale01 because this function rescales the values in a vector to lie between 0 and 1.
  2. The arguments. The arguments are things that change each time you use the function. Our analysis above tells us that we have just one argument. We’ll call it x because this is the conventional name for a numeric vector.
rescale01 <- function(x) {
  
}

To turn your code into a function you need three things:

  1. A name. Here we’ll use rescale01 because this function rescales the values in a vector to lie between 0 and 1.
  2. The arguments. The arguments are things that change each time you use the function. Our analysis above tells us that we have just one argument. We’ll call it x because this is the conventional name for a numeric vector.
  3. The body. The body is the code that’s repeated across all the calls.
rescale01 <- function(x) {
  (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}

Test it out


rescale01(c(-10, 0, 10))
[1] 0.0 0.5 1.0


rescale01(c(1, 2, 3, NA, 5))
[1] 0.00 0.25 0.50   NA 1.00

Test it out


rescale01(c(-10, 0, 10))
[1] 0.0 0.5 1.0


rescale01(c(1, 2, 3, NA, 5))
[1] 0.00 0.25 0.50   NA 1.00


Looks good

Apply to original df

df |> mutate(
  a = rescale01(a),
  b = rescale01(b),
  c = rescale01(c),
  d = rescale01(d),
)
# A tibble: 5 × 4
      a      b      c     d
  <dbl>  <dbl>  <dbl> <dbl>
1 1     0.0447 1      0    
2 0.313 0.946  0.492  1    
3 0     0      0      0.474
4 0.766 1      0.386  0.212
5 0.914 0.737  0.0439 0.403

Apply to original df

df |> mutate(
  a = rescale01(a),
  b = rescale01(b),
  c = rescale01(c),
  d = rescale01(d),
)
# A tibble: 5 × 4
      a      b      c     d
  <dbl>  <dbl>  <dbl> <dbl>
1 1     0.0447 1      0    
2 0.313 0.946  0.492  1    
3 0     0      0      0.474
4 0.766 1      0.386  0.212
5 0.914 0.737  0.0439 0.403

Now we only need to change the input column in one place, so the code is more streamlined, we are less likely to make a mistake, and we are more likely to spot a mistake.