Functions: An introduction

Environmental Data Analysis and Visualization

Intro to functions

Recall that you use functions all the time

The code that comes before parentheses is the name of a function.

ggplot(penguins...) +
  ...

ggplot() is a function.

Recall that you use functions all the time

The code that comes before parentheses is the name of a function.

mean(x)

mean() is a function.

Recall that you use functions all the time

When you load a package, it loads all of the functions associated with that package.
However, sometimes there isn’t already an existing function to do what we want.
When that’s the case, we can write our own custom function.

Why are custom functions useful?

We want to rescale the values in each column to have a range from 0 to 1

df <- tibble(
  a = rnorm(5),
  b = rnorm(5),
  c = rnorm(5),
  d = rnorm(5),
)

df

# A tibble: 5 × 4
       a      b      c      d
   <dbl>  <dbl>  <dbl>  <dbl>
1  1.04  -0.955  1.74  -1.46 
2 -1.06   0.818  0.499  0.336
3 -2.01  -1.04  -0.697 -0.610
4  0.323  0.924  0.243 -1.08 
5  0.774  0.407 -0.590 -0.737

Why are functions useful?

We want to rescale the values in each column to have a range from 0 to 1. Can you spot the mistake?

df |> mutate(
  a = (a - min(a, na.rm = TRUE)) / 
    (max(a, na.rm = TRUE) - min(a, na.rm = TRUE)),
  b = (b - min(a, na.rm = TRUE)) / 
    (max(b, na.rm = TRUE) - min(b, na.rm = TRUE)),
  c = (c - min(c, na.rm = TRUE)) / 
    (max(c, na.rm = TRUE) - min(c, na.rm = TRUE)),
  d = (d - min(d, na.rm = TRUE)) / 
    (max(d, na.rm = TRUE) - min(d, na.rm = TRUE)),
)

# A tibble: 5 × 4
      a      b      c     d
  <dbl>  <dbl>  <dbl> <dbl>
1 1     -0.485 1      0    
2 0.313  0.416 0.492  1    
3 0     -0.530 0      0.474
4 0.766  0.470 0.386  0.212
5 0.914  0.207 0.0439 0.403

Why are functions useful?

In this example, we are copying and pasting basically the same code over and over.
The only thing we are changing is the column name.

Why is this a bad idea?

Writing a function: First determine which parts of your code are constant and which parts change.

What are we changing each time we run our code to rescale each of the columns?

(a - min(a, na.rm = TRUE)) / (max(a, na.rm = TRUE) - min(a, na.rm = TRUE))
(b - min(b, na.rm = TRUE)) / (max(b, na.rm = TRUE) - min(b, na.rm = TRUE))
(c - min(c, na.rm = TRUE)) / (max(c, na.rm = TRUE) - min(c, na.rm = TRUE))
(d - min(d, na.rm = TRUE)) / (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))

Writing a function: First determine which parts of your code are constant and which parts change.

What are we changing each time we run our code to rescale the data?

To turn your code into a function you need three things:

A name
The arguments
The body

name <- function(arguments) {
  body
}

To turn your code into a function you need three things:

A name. Here we’ll use rescale01 because this function rescales the values in a vector to lie between 0 and 1.

rescale01 <- function() {
  
}

To turn your code into a function you need three things:

A name. Here we’ll use rescale01 because this function rescales the values in a vector to lie between 0 and 1.
The arguments. The arguments are things that change each time you use the function. Our analysis above tells us that we have just one argument. We’ll call it x because this is the conventional name for a numeric vector.

rescale01 <- function(x) {
  
}

To turn your code into a function you need three things:

A name. Here we’ll use rescale01 because this function rescales the values in a vector to lie between 0 and 1.
The arguments. The arguments are things that change each time you use the function. Our analysis above tells us that we have just one argument. We’ll call it x because this is the conventional name for a numeric vector.
The body. The body is the code that’s repeated across all the calls.

rescale01 <- function(x) {
  (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}

Test it out

rescale01(c(-10, 0, 10))

[1] 0.0 0.5 1.0

rescale01(c(1, 2, 3, NA, 5))

[1] 0.00 0.25 0.50   NA 1.00

Test it out

rescale01(c(-10, 0, 10))

[1] 0.0 0.5 1.0

rescale01(c(1, 2, 3, NA, 5))

[1] 0.00 0.25 0.50   NA 1.00

Looks good

Apply to original `df`

df |> mutate(
  a = rescale01(a),
  b = rescale01(b),
  c = rescale01(c),
  d = rescale01(d),
)

# A tibble: 5 × 4
      a      b      c     d
  <dbl>  <dbl>  <dbl> <dbl>
1 1     0.0447 1      0    
2 0.313 0.946  0.492  1    
3 0     0      0      0.474
4 0.766 1      0.386  0.212
5 0.914 0.737  0.0439 0.403

Apply to original `df`

df |> mutate(
  a = rescale01(a),
  b = rescale01(b),
  c = rescale01(c),
  d = rescale01(d),
)

# A tibble: 5 × 4
      a      b      c     d
  <dbl>  <dbl>  <dbl> <dbl>
1 1     0.0447 1      0    
2 0.313 0.946  0.492  1    
3 0     0      0      0.474
4 0.766 1      0.386  0.212
5 0.914 0.737  0.0439 0.403

Now we only need to change the input column in one place, so the code is more streamlined, we are less likely to make a mistake, and we are more likely to spot a mistake.

Functions: An introduction

Intro to functions

Recall that you use functions all the time

Recall that you use functions all the time

Recall that you use functions all the time

Why are custom functions useful?

Why are functions useful?

Why are functions useful?

Why is this a bad idea?

Writing a function: First determine which parts of your code are constant and which parts change.

Writing a function: First determine which parts of your code are constant and which parts change.

To turn your code into a function you need three things:

To turn your code into a function you need three things:

To turn your code into a function you need three things:

To turn your code into a function you need three things:

Test it out

Test it out

Looks good

Apply to original df

Apply to original df

Now we only need to change the input column in one place, so the code is more streamlined, we are less likely to make a mistake, and we are more likely to spot a mistake.

Apply to original `df`

Apply to original `df`