Data types



Environmental Data Analysis and Visualization

Why should you care about data types?

Example: Cat lovers

A survey asked respondents their name and number of cats. The instructions said to enter the number of cats as a numerical value.

cat_lovers <- read_csv("data/cat-lovers.csv")
# A tibble: 60 × 3
   name           number_of_cats handedness
   <chr>          <chr>          <chr>     
 1 Bernice Warren 0              left      
 2 Woodrow Stone  0              left      
 3 Willie Bass    1              left      
 4 Tyrone Estrada 3              left      
 5 Alex Daniels   3              left      
 6 Jane Bates     2              left      
 7 Latoya Simpson 1              left      
 8 Darin Woods    1              left      
 9 Agnes Cobb     0              left      
10 Tabitha Grant  0              left      
# ℹ 50 more rows

Oh why won’t you work?!

cat_lovers |> 
  summarise(mean_cats = mean(number_of_cats))
Warning: There was 1 warning in `summarise()`.
ℹ In argument: `mean_cats = mean(number_of_cats)`.
Caused by warning in `mean.default()`:
! argument is not numeric or logical: returning NA
# A tibble: 1 × 1
  mean_cats
      <dbl>
1        NA

Oh why won’t you work?!

?mean

Take a breath and look at your data

What is the “type” of the number_of_cats variable?

glimpse(cat_lovers)
Rows: 60
Columns: 3
$ name           <chr> "Bernice Warren", "Woodrow Stone", "Willie Bass", "Tyro…
$ number_of_cats <chr> "0", "0", "1", "3", "3", "2", "1", "1", "0", "0", "0", …
$ handedness     <chr> "left", "left", "left", "left", "left", "left", "left",…

Let’s take another look

Sometimes you might need to babysit your respondents

cat_lovers |> 
  mutate(number_of_cats = case_when(
    name == "Ginger Clark" ~ 2,
    name == "Doug Bass"    ~ 3,
    TRUE                   ~ as.numeric(number_of_cats))) |> 
  summarise(mean_cats = mean(number_of_cats))
# A tibble: 1 × 1
  mean_cats
      <dbl>
1     0.833

You always need to respect data types

cat_lovers |> 
  mutate(
    number_of_cats = case_when(
      name == "Ginger Clark" ~ "2",
      name == "Doug Bass"    ~ "3",
      TRUE                   ~ number_of_cats),
    number_of_cats = as.numeric(number_of_cats)) |> 
  summarise(mean_cats = mean(number_of_cats))
# A tibble: 1 × 1
  mean_cats
      <dbl>
1     0.833

Now that we know what we’re doing, assign output to object

cat_lovers <- cat_lovers |> 
  mutate(
    number_of_cats = case_when(
      name == "Ginger Clark" ~ "2",
      name == "Doug Bass"    ~ "3",
      TRUE                   ~ number_of_cats),
    number_of_cats = as.numeric(number_of_cats))

Moral of the story

  • If your data does not behave how you expect it to, type coercion when reading in the data might be the reason.

  • Go in and investigate your data, apply the fix, save your data, live happily ever after.

Data types

Data types in R

  • logical
  • double
  • integer
  • character
  • there are more, but we won’t be focusing on those

Logical & character

logical - boolean values TRUE and FALSE

typeof(TRUE)
[1] "logical"

character - character strings

typeof("hello")
[1] "character"

Double & integer

double - floating point numerical values (default numerical type)

typeof(1.335)
[1] "double"
typeof(7)
[1] "double"

integer - integer numerical values (indicated with an L)

typeof(7L)
[1] "integer"
typeof(1:3)
[1] "integer"

Concatenation

Vectors can be constructed using the c() function.

c(1, 2, 3)
[1] 1 2 3
c("Hello", "World!")
[1] "Hello"  "World!"
c(c("hi", "hello"), c("bye", "jello"))
[1] "hi"    "hello" "bye"   "jello"

Converting between types - numeric to character

x <- 1:3
x
[1] 1 2 3
typeof(x)
[1] "integer"
y <- as.character(x)
y
[1] "1" "2" "3"
typeof(y)
[1] "character"

Converting between types - logical to double

x <- c(TRUE, FALSE)
x
[1]  TRUE FALSE
typeof(x)
[1] "logical"
y <- as.numeric(x)
y
[1] 1 0
typeof(y)
[1] "double"

Be aware of what happens when you combine different data types in the same vector

R will happily convert between various types without complaint when different types of data are concatenated in a vector, and that’s not always a great thing!

c(1, "Hello")
[1] "1"     "Hello"
c(FALSE, 3L)
[1] 0 3
c(1.2, 3L)
[1] 1.2 3.0
c(2L, "two")
[1] "2"   "two"

Explicit vs. implicit coercion

Let’s give formal names to what we’ve seen so far:

  • Explicit coercion is when you call a function like as.logical(), as.numeric(), as.integer(), as.double(), or as.character()

  • Implicit coercion happens when you use a vector in a specific context that expects a certain type of vector (like combining multiple data types in one)

Example

Suppose we want to know the type of c(1, "a").

First, I’d look at:

typeof(1)
[1] "double"
typeof("a")
[1] "character"

and make a guess about what type R thinks the vector is based on the type of each element of the vector.

Example

Suppose we want to know the type of c(1, "a").

First, I’d look at:

typeof(1)
[1] "double"
typeof("a")
[1] "character"

and make a guess based on these. Then finally I’d check:

typeof(c(1, "a"))
[1] "character"

Special values

Special values

  • NA: Not available
  • NaN: Not a number
  • Inf: Positive infinity
  • -Inf: Negative infinity

Why might we end up with NaN or Inf?

pi / 0
[1] Inf
0 / 0
[1] NaN
1/0 - 1/0
[1] NaN
1/0 + 1/0
[1] Inf

NAs are special ❄️s

x <- c(1, 2, 3, 4, NA)
mean(x)
[1] NA
mean(x, na.rm = TRUE)
[1] 2.5
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00    1.75    2.50    2.50    3.25    4.00       1 

NAs are special ❄️s

Some functions will not execute if the data contains NAs. Usually, they include an optional argument to specify whether to remove NAs. Otherwise you can use the drop_na() function to remove them yourself.

x <- c(1, 2, 3, 4, NA)
mean(x)
[1] NA
mean(x, na.rm = TRUE)
[1] 2.5
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00    1.75    2.50    2.50    3.25    4.00       1 

AE-07

  • AE 07 - Data types and classes > open type-coercion.qmd.

  • What is the type of the given vectors? First, guess. Then, try it out in R. If your guess was correct, great! If not, discuss why they have that type.