My R Take on Advent of Code - Day 4

02 Jan 2019

tidyverse

After some wonderful Christmas and New Year’s distractions, now it’s time to continue with my Advent of Code challenges in R (before the summer comes…).

To avoid waffling, the 4th puzzle offers a record of guards’ shifts with various activities plus the time they started and time. We need to gather two things from this dataset:

Which guard sleeps most (minutes) and
What minute does that guard spend asleep the most?

Then, we have to multiply the guard number by the most common minute he/she falls asleep to get the final solution. Let’s get down to work, then!

First, let’s have a look at the data:

library(tidyverse)

raw_input <- read.delim('day4-raw-input.txt', header = F)
head(raw_input)

##                                           V1
## 1                [1518-09-28 00:56] wakes up
## 2            [1518-10-15 00:05] falls asleep
## 3                [1518-02-15 00:58] wakes up
## 4                [1518-08-26 00:51] wakes up
## 5                [1518-03-23 00:32] wakes up
## 6 [1518-05-04 23:56] Guard #523 begins shift

This data needs some serious cleaning! Let’s brush up on our regex knowledge a bit and seperate timestamps from guard activities:

# clean the input
clean_input <- raw_input %>%
  rename(value = V1) %>% 
  mutate(timestamp = lubridate::ymd_hm(str_extract(value,                                              '[:digit:]+-[:digit:]+-[:digit:]+..[:digit:]+:[:digit:]+')),
         action = str_extract(value, '[:alpha:]+..[:alpha:]+'),
         guard_num = ifelse(str_detect(value, '#'),
                            str_extract(value, '#[:digit:]+'), NA),
         date = lubridate::date(timestamp),
         minute = lubridate::minute(timestamp) # minute that the activity started 
  ) %>%
  arrange(timestamp) %>% # sort in chronological order
  fill(guard_num) # fill in missing guard numbers 

head(clean_input)

##                                         value           timestamp
## 1  [1518-01-28 00:00] Guard #151 begins shift 1518-01-28 00:00:00
## 2             [1518-01-28 00:40] falls asleep 1518-01-28 00:40:00
## 3                 [1518-01-28 00:49] wakes up 1518-01-28 00:49:00
## 4             [1518-01-28 00:57] falls asleep 1518-01-28 00:57:00
## 5                 [1518-01-28 00:58] wakes up 1518-01-28 00:58:00
## 6 [1518-01-29 00:04] Guard #2017 begins shift 1518-01-29 00:04:00
##         action guard_num       date minute
## 1        Guard      #151 1518-01-28      0
## 2 falls asleep      #151 1518-01-28     40
## 3     wakes up      #151 1518-01-28     49
## 4 falls asleep      #151 1518-01-28     57
## 5     wakes up      #151 1518-01-28     58
## 6        Guard     #2017 1518-01-29      4

Now that we have a clean dataset, we can determine who sleeps most:

# who sleeps most 
clean_input %>% 
  filter(action != 'Guard') %>% # we don't need this anymore 
  group_by(guard_num, date) %>% 
  #calculate time asleep
  mutate(time_asleep = ifelse(action == 'wakes up', minute - lag(minute), NA ) 
  ) %>% 
  group_by(guard_num) %>% 
  na.omit() %>% 
  summarise(total_asleep = sum(time_asleep)) %>%  # sum it
  arrange(desc(total_asleep)) %>% # sort it
  slice(1) # pick the guard that sleeps most

## # A tibble: 1 x 2
##   guard_num total_asleep
##   <chr>            <int>
## 1 #409               544

There you go, shame on guard number #409! Now, how can we see what is the most common time for him (her?) to fall asleep? This will require some data re-arranging:

# what's the most common time to sleep for guard #409

guard_data <- clean_input %>% 
  filter(action != 'Guard') %>% # we don;t need it anymore!
  filter(guard_num == '#409') %>% # pick the winner
  arrange(timestamp) %>% 
  spread(action, minute) %>% # prep the data for sequences
  rename(falls_asleep = `falls asleep`,
         wakes_up = `wakes up`) %>% 
  mutate(falls_asleep = ifelse(!is.na(wakes_up), lag(falls_asleep), falls_asleep )) %>% 
  na.omit()

head(guard_data)

##                          value           timestamp guard_num       date
## 2  [1518-02-25 00:40] wakes up 1518-02-25 00:40:00      #409 1518-02-25
## 4  [1518-04-02 00:24] wakes up 1518-04-02 00:24:00      #409 1518-04-02
## 6  [1518-04-02 00:52] wakes up 1518-04-02 00:52:00      #409 1518-04-02
## 8  [1518-04-12 00:59] wakes up 1518-04-12 00:59:00      #409 1518-04-12
## 10 [1518-04-15 00:14] wakes up 1518-04-15 00:14:00      #409 1518-04-15
## 12 [1518-04-15 00:50] wakes up 1518-04-15 00:50:00      #409 1518-04-15
##    falls_asleep wakes_up
## 2             3       40
## 4             1       24
## 6            49       52
## 8            57       59
## 10            4       14
## 12           25       50

My idea is to create sequences of minutes between the minute the guard falls asleep and wakes up using seq(). Check out this simple example:

seq(3, 40)

##  [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [24] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Easy - peasy! We can apply it to the our data using map2():

# apply the funtion to the #409 guard's data
map2(guard_data$falls_asleep,
          guard_data$wakes_up, seq) %>% 
  unlist() %>%  # turn a list into a vector
  table() %>%  # get a frequency table
  sort() # sort it in ascending order. There are three potential answers! the middle one is correct (?!?!?!)

## .
##  1  2 59  3 58  4  5  6 15 56 57  7  8  9 10 11 12 13 14 16 17 18 19 20 21 
##  1  3  3  5  5  6  7  7  7  7  7  8  8  8  8  8  8  8  8  8  8  8  8  8  8 
## 22 23 24 25 26 55 27 47 28 29 33 34 46 54 30 31 32 35 45 48 43 44 49 53 36 
##  8  8  8  8  9  9 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 14 
## 37 38 39 40 41 42 50 51 52 
## 14 14 14 14 14 14 15 15 15

Now, that’s interesting. The top number is minute and the bottom number is the number of times the guard slept during this time. So, when you look at the last (most common) minutes in the vector you’ll notice that there are THREE (not one!) different times of the same highest frequency and they are minute 50, 51 and 52. I’m not sure if this means there’s a flaw in my solution, but after trying all three numbers it’s clear that the middle one (51) is correct:

## final solution
## guard number multiplied by the most "commonly slept"" minute number

409*51

## [1] 20859