-
Notifications
You must be signed in to change notification settings - Fork 0
/
05_exercises_AME.Rmd
352 lines (260 loc) · 15.3 KB
/
05_exercises_AME.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
---
title: 'Weekly Exercises #5'
author: "Ana Espeleta"
output:
html_document:
keep_md: TRUE
toc: TRUE
toc_float: TRUE
df_print: paged
code_download: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error=TRUE, message=FALSE, warning=FALSE)
```
```{r libraries}
library(tidyverse) # for data cleaning and plotting
library(gardenR) # for Lisa's garden data
library(lubridate) # for date manipulation
library(openintro) # for the abbr2state() function
library(palmerpenguins)# for Palmer penguin data
library(maps) # for map data
library(ggmap) # for mapping points on maps
library(gplots) # for col2hex() function
library(RColorBrewer) # for color palettes
library(sf) # for working with spatial data
library(leaflet) # for highly customizable mapping
library(ggthemes) # for more themes (including theme_map())
library(plotly) # for the ggplotly() - basic interactivity
library(gganimate) # for adding animation layers to ggplots
library(transformr) # for "tweening" (gganimate)
library(gifski) # need the library for creating gifs but don't need to load each time
library(shiny) # for creating interactive apps
theme_set(theme_minimal())
library(babynames)
```
```{r data}
# SNCF Train data
small_trains <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/small_trains.csv")
# Lisa's garden data
data("garden_harvest")
# Lisa's Mallorca cycling data
mallorca_bike_day7 <- read_csv("https://www.dropbox.com/s/zc6jan4ltmjtvy0/mallorca_bike_day7.csv?dl=1") %>%
select(1:4, speed)
# Heather Lendway's Ironman 70.3 Pan Am championships Panama data
panama_swim <- read_csv("https://raw.githubusercontent.com/llendway/gps-data/master/data/panama_swim_20160131.csv")
panama_bike <- read_csv("https://raw.githubusercontent.com/llendway/gps-data/master/data/panama_bike_20160131.csv")
panama_run <- read_csv("https://raw.githubusercontent.com/llendway/gps-data/master/data/panama_run_20160131.csv")
#COVID-19 data from the New York Times
covid19 <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
freedom <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-22/freedom.csv')
```
## Put your homework on GitHub!
Go [here](https://github.com/llendway/github_for_collaboration/blob/master/github_for_collaboration.md) or to previous homework to remind yourself how to get set up.
Once your repository is created, you should always open your **project** rather than just opening an .Rmd file. You can do that by either clicking on the .Rproj file in your repository folder on your computer. Or, by going to the upper right hand corner in R Studio and clicking the arrow next to where it says Project: (None). You should see your project come up in that list if you've used it recently. You could also go to File --> Open Project and navigate to your .Rproj file.
## Instructions
* Put your name at the top of the document.
* **For ALL graphs, you should include appropriate labels and alt text.**
* Feel free to change the default theme, which I currently have set to `theme_minimal()`.
* Use good coding practice. Read the short sections on good code with [pipes](https://style.tidyverse.org/pipes.html) and [ggplot2](https://style.tidyverse.org/ggplot2.html). **This is part of your grade!**
* **NEW!!** With animated graphs, add `eval=FALSE` to the code chunk that creates the animation and saves it using `anim_save()`. Add another code chunk to reread the gif back into the file. See the [tutorial](https://animation-and-interactivity-in-r.netlify.app/) for help.
* When you are finished with ALL the exercises, uncomment the options at the top so your document looks nicer. Don't do it before then, or else you might miss some important warnings and messages.
## Warm-up exercises from tutorial
1. Choose 2 graphs you have created for ANY assignment in this class and add interactivity using the `ggplotly()` function.
```{r, eval=FALSE, fig.alt= "plot showing the relationship between the number of civil liberties and political rights between varying levels of developed countries."}
freedom_graph <-
freedom %>%
filter(Region_Code == 2) %>%
group_by(country,is_ldc) %>%
summarise(civil_count = sum(CL), political_count = sum(PR)) %>%
ggplot(aes(x = civil_count, y = political_count, color = is_ldc))+
geom_point()+
labs(x = "Civil Liberties", y ="Political Rights", title = "Civil Liberties and Political Rights in Relation to Development Status in Africa", color = "Most/Least Developed Countries(0-1)")
#theme_gray()+
#transition_states(is_ldc)
ggplotly(freedom_graph)
```
```{r, eval=FALSE, fig.alt="Interactive plot showing the progression of Covid19 cases in the states of Minnesota, Wisconsin, Iowa, North Dakota, and South Dakota."}
covid_interactive <-
covid19 %>%
filter(state %in% c("Minnesota", "Wisconsin", "Iowa", "North Dakota", "South Dakota")) %>%
group_by(state, date) %>%
summarise(case_count = sum(cases)) %>%
ggplot(aes(x= date, y = case_count, color = state))+
geom_line()+
labs(title = "Number of Covid-19 Cases by Date in Each State",
subtitle = "Date: {frame_along}",
x = "",
y = "",
color = "State")
#transition_reveal(date)
ggplotly(covid_interactive)
```
2. Use animation to tell an interesting story with the `small_trains` dataset that contains data from the SNCF (National Society of French Railways). These are Tidy Tuesday data! Read more about it [here](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-02-26).
```{r, eval = FALSE, fig.alt="Animated scatter plot of total amount of trips over month and year."}
small_trains_gif <-
small_trains %>%
mutate(year1 = as.character(year)) %>%
group_by(year1, month) %>%
summarize(tot_trips_station = sum(total_num_trips)) %>%
ggplot(aes(x = month, y = tot_trips_station, color = year1))+
geom_jitter()+
labs(title = "Total Amount of Bike Trips Over Month and Year", x = "", y= "")+
transition_states(month)
animate(small_trains_gif)
anim_save("small_trains_gif")
```
```{r}
knitr::include_graphics("small_trains_gif")
```
## Garden data
3. In this exercise, you will create a stacked area plot that reveals itself over time (see the `geom_area()` examples [here](https://ggplot2.tidyverse.org/reference/position_stack.html)). You will look at cumulative harvest of tomato varieties over time. You should do the following:
* From the `garden_harvest` data, filter the data to the tomatoes and find the *daily* harvest in pounds for each variety.
* Then, for each variety, find the cumulative harvest in pounds.
* Use the data you just made to create a static cumulative harvest area plot, with the areas filled with different colors for each variety and arranged (HINT: `fct_reorder()`) from most to least harvested weights (most on the bottom).
* Add animation to reveal the plot over date.
I have started the code for you below. The `complete()` function creates a row for all unique `date`/`variety` combinations. If a variety is not harvested on one of the harvest dates in the dataset, it is filled with a value of 0.
```{r, eval=FALSE, fig.alt = "Animated area plot of daily tomato variety cumulative harvest."}
garden_gif <-
garden_harvest %>%
filter(vegetable == "tomatoes") %>%
group_by(date, variety) %>%
summarize(daily_harvest_lb = sum(weight)*0.00220462) %>%
ungroup() %>%
complete(variety,
date,
fill = list(daily_harvest_lb = 0)) %>%
group_by(variety) %>%
mutate(cum_harvest = cumsum(daily_harvest_lb),
variety = fct_reorder(variety, cum_harvest, sum, .desc = TRUE)) %>%
ggplot(aes(x = date, y = cum_harvest, fill = variety))+
geom_area()+
labs(title = "Cumulative Tomato Variety Harvest per Day ", x="", y= "")+
geom_text(aes(label = variety), position = "stack", check_overlap = TRUE)+
transition_reveal(date)
animate(garden_gif)
anim_save("garden_gif")
```
```{r}
knitr::include_graphics("garden_gif")
```
## Maps, animation, and movement!
4. Map Lisa's `mallorca_bike_day7` bike ride using animation!
Requirements:
* Plot on a map using `ggmap`.
* Show "current" location with a red point.
* Show path up until the current point.
* Color the path according to elevation.
* Show the time in the subtitle.
```{r, eval=FALSE, fig.alt = "Map that shows the bike path taken by Lisa in Mallorca, with different coloring for elevation."}
mallorca_map <- get_stamenmap(
bbox = c(left = 2.28, bottom = 39.41, right = 3.03, top = 39.8),
maptype = "terrain",
zoom = 11
)
mallorca_gif <-
ggmap(mallorca_map) +
geom_path(data = mallorca_bike_day7,
aes(x = lon, y = lat, color = ele),
size = .5) +
scale_color_viridis_c(option = "magma") +
theme_map() +
geom_point(data = mallorca_bike_day7,
aes(x = lon,
y = lat),
color = "red",
size = 2)+
labs(title = "Mallorca Bike Trip", x="Longitude", y= "Latitude")+
theme(legend.background = element_blank())+
transition_reveal(time)
animate(mallorca_gif)
anim_save("mallorca_gif")
```
```{r}
knitr::include_graphics("mallorca_gif")
```
* CHALLENGE: use the `ggimage` package and `geom_image` to add a bike image instead of a red point. You can use [this](https://raw.githubusercontent.com/llendway/animation_and_interactivity/master/bike.png) image. See [here](https://goodekat.github.io/presentations/2019-isugg-gganimate-spooky/slides.html#35) for an example.
* Add something of your own! And comment on if you prefer this to the static map and why or why not.
> I prefer this map to the static map because it allows you to see how the cumulative harvest increases as the days pass.
5. In this exercise, you get to meet Lisa's sister, Heather! She is a proud Mac grad, currently works as a Data Scientist where she uses R everyday, and for a few years (while still holding a full-time job) she was a pro triathlete. You are going to map one of her races. The data from each discipline of the Ironman 70.3 Pan Am championships, Panama is in a separate file - `panama_swim`, `panama_bike`, and `panama_run`. Create a similar map to the one you created with my cycling data. You will need to make some small changes:
1. combine the files putting them in swim, bike, run order (HINT: `bind_rows()`),
2. make the leading dot a different color depending on the event (for an extra challenge, make it a different image using `geom_image()!),
3. CHALLENGE (optional): color by speed, which you will need to compute on your own from the data. You can read Heather's race report [here](https://heatherlendway.com/2016/02/10/ironman-70-3-pan-american-championships-panama-race-report/). She is also in the Macalester Athletics [Hall of Fame](https://athletics.macalester.edu/honors/hall-of-fame/heather-lendway/184) and still has records at the pool.
```{r, eval=FALSE, fig.alt = "Animated map showing the path of Heather's Ironman race in Panama."}
panama_iron<-
bind_rows(panama_swim, panama_bike, panama_run)
panama_map <- get_stamenmap(
bbox = c(left = -79.7791, bottom = 8.7964, right = -79.2499, top = 9.1663),
maptype = "terrain",
zoom = 12)
ggmap(panama_map) +
geom_point(data = panama_iron,
aes(x = lon, y = lat, color = event),
size = 5) +
geom_path(data = panama_iron,
aes(x = lon, y = lat),
size = 0.5) +
labs(title = "Heather's Ironman",
subtitle = "Timestamp: {frame_along}",
x = "",
y = "",
color = "Event") +
theme_map() +
transition_reveal(time)
anim_save("triathlon")
```
```{r}
knitr::include_graphics("triathlon")
```
## COVID-19 data
6. In this exercise you will animate a map of the US, showing how cumulative COVID-19 cases per 10,000 residents has changed over time. This is similar to exercises 11 & 12 from the previous exercises, with the added animation! So, in the end, you should have something like the static map you made there, but animated over all the days. The code below gives the population estimates for each state and loads the `states_map` data. Here is a list of details you should include in the plot:
* Put date in the subtitle.
* Because there are so many dates, you are going to only do the animation for the the 15th of each month. So, filter only to those dates - there are some lubridate functions that can help you do this.
* Use the `animate()` function to make the animation 200 frames instead of the default 100 and to pause for 10 frames on the end frame.
* Use `group = date` in `aes()`.
* Comment on what you see.
```{r, eval=FALSE, fig.alt = "Map of covid19 cases per 10000 people animated to show the progression through time."}
census_pop_est_2018 <- read_csv("https://www.dropbox.com/s/6txwv3b4ng7pepe/us_census_2018_state_pop_est.csv?dl=1") %>%
separate(state, into = c("dot","state"), extra = "merge") %>%
select(-dot) %>%
mutate(state = str_to_lower(state))
states_map <- map_data("state")
covid_new <- covid19 %>%
arrange(desc(date)) %>%
group_by(state) %>%
mutate(numberrow = 1:n()) %>%
mutate(state = str_to_lower(`state`))
covid_pop <- covid_new %>%
left_join(census_pop_est_2018,
by = c("state" = "state")) %>%
mutate(per_10000 = (cases/est_pop_2018)*10000)
covid_final <- covid_pop %>%
filter(day(date) == 15) %>%
ggplot() +
geom_map(map = states_map,
aes(map_id = state,
fill = per_10000,
group = date)) +
labs(title = "Recent COVID-19 Cases per 10,000 People",
subtitle = "Date: {closest_state}",
fill = "Cases per 10,000",
caption = "Graph by Ana Espeleta, data from covid19, map from map_data") +
expand_limits(x = states_map$long, y = states_map$lat) +
scale_fill_viridis_c(option = "viridis", direction = -1) +
theme_map() +
theme(legend.background = element_blank()) +
transition_states(date, transition_length = 0)
animate(covid_final, nframes = 200, end_pause = 10)
anim_save("covid_final")
```
```{r}
knitr::include_graphics("covid_final")
```
>The map shows that the Midwest region of the U.S. has darker colors than the rest of the country, meaning there is more density of Covid cases in that region.
## Your first `shiny` app (for next week!)
7. This app will also use the COVID data. Make sure you load that data and all the libraries you need in the `app.R` file you create. You should create a new project for the app, separate from the homework project. Below, you will post a link to the app that you publish on shinyapps.io. You will create an app to compare states' daily number of COVID cases per 100,000 over time. The x-axis will be date. You will have an input box where the user can choose which states to compare (`selectInput()`), a slider where the user can choose the date range, and a submit button to click once the user has chosen all states they're interested in comparing. The graph should display a different line for each state, with labels either on the graph or in a legend. Color can be used if needed.
Put the link to your app here:
[text](https://anaespeleta115.shinyapps.io/covidApp/)
## GitHub link
8. Below, provide a link to your GitHub repo with this set of Weekly Exercises.
**DID YOU REMEMBER TO UNCOMMENT THE OPTIONS AT THE TOP?**