A few weeks ago the thought of live streaming my coding – mistakes and all – to the world would have sent me screaming in the other direction. But in a suprisingly short amount of time it’s become something that I find myself running toward.
The core of my educational philosophy is that learning is relational, that we learn better when we learn together, and that the quality of learning is affected by the relationships among learners as well as between the learner(s) and educator(s). But I also recognize that a lot of my beliefs around learning being a community endeavor have been “for thee but not for me” – because of my ego.
Ego is a tough thing to battle, because if I make coding mistakes on my own and no one can see them, I get to preserve my ego. No one knows that I made mistakes, and no one can judge the mistakes that I’ve made.
But the flip side of that is that I don’t always catch my mistakes, and I often spend a lot of time figuring out how to fix my mistakes. And the cost? Being vulnerable and opening up my ego to taking hits.
There were soooo many! SO MANY! (Watching the Fast and Furious 9 trailer was not one of them though.)
I’m working on a “Today I learned” blog post summary, but here’s a quick list:
- not being able to get a time series graph to work
- liberally using
drop_na()in all the wrong ways
- allowing duplicate data through into my final plot
- completely botching string filters
- not knowing the difference between `` and "" when filtering
BUT! Because I was learning out loud, on the fly, with a community, many of those mistakes were caught and handled immediately. And I learned so much in such a short span of time – more than I would have if I sat down and tried to do it solo.
Here’s a video walk through of the code used to generate the initial stream plot:
And the code:
library(tidyverse) library(ggstream) library(wesanderson)
fishing <- fishing <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-08/fishing.csv')
fishing %>% drop_na() %>% group_by(year, lake) %>% summarise(total_fish = sum(values)) %>% ggplot(aes(x = year, y = total_fish, fill = lake)) + geom_stream() + scale_fill_manual(values = wes_palette("Darjeeling2")) + theme_minimal()
saving the plot
ggsave("streamplot.png", device = "png")
There are definitely some issues with this plot!
For starters, the placement of
drop_na() is going to eliminate data that we actually want to keep.
To see a great implementation of this, check out Eugen Buehler’s tweet:
That is a really nice looking plot! I did some geom_point plots and saw that there were a lot of gaps in the data. I thought that selecting the relevant columns (lake, year, values) before drop_na might fix the problem. This was the result. pic.twitter.com/3MMjjUaBtc— Eugen Buehler 🏳️🌈🏳️⚧️ (@EugenBuehler) June 8, 2021
And as pointed out by
Christoph Nicault later in the thread, we need a
filter() step before the sum, otherwise we’ll end up with duplicate data:
Nice plot ! Also the data need to be filtered as well before the sum, as there is already a total by country for each lake/year/specie, otherwise it's counted twice.— Christophe Nicault (@cnicault) June 8, 2021
As always, #TidyTuesday Unfiltered is intended to get you started with a plot, but is never meant to be the final plot! I hope you’re able to take this code and resources and run with them to create something fantastic, and please tag me on Twitter when you do – I’d love to see what you create!