Stream makes a streamplot

Be careful with how you use `drop_na()` kids!

By Jesse Mostipak in walkthrough reflection

June 8, 2021

Live streaming your coding mistakes to the world

A few weeks ago the thought of live streaming my coding – mistakes and all – to the world would have sent me screaming in the other direction. But in a suprisingly short amount of time it’s become something that I find myself running toward.

The core of my educational philosophy is that learning is relational, that we learn better when we learn together, and that the quality of learning is affected by the relationships among learners as well as between the learner(s) and educator(s). But I also recognize that a lot of my beliefs around learning being a community endeavor have been “for thee but not for me” – because of my ego.

Ego is a tough thing to battle, because if I make coding mistakes on my own and no one can see them, I get to preserve my ego. No one knows that I made mistakes, and no one can judge the mistakes that I’ve made.

But the flip side of that is that I don’t always catch my mistakes, and I often spend a lot of time figuring out how to fix my mistakes. And the cost? Being vulnerable and opening up my ego to taking hits.

Mistakes I made last night

There were soooo many! SO MANY! (Watching the Fast and Furious 9 trailer was not one of them though.)

I’m working on a “Today I learned” blog post summary, but here’s a quick list:

  • not being able to get a time series graph to work
  • liberally using drop_na() in all the wrong ways
  • allowing duplicate data through into my final plot
  • completely botching string filters
  • not knowing the difference between `` and "" when filtering

BUT! Because I was learning out loud, on the fly, with a community, many of those mistakes were caught and handled immediately. And I learned so much in such a short span of time – more than I would have if I sat down and tried to do it solo.

Without further ado

Here’s a video walk through of the code used to generate the initial stream plot:

And the code:

setup

library(tidyverse)
library(ggstream)
library(wesanderson)

import

fishing <- fishing <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-08/fishing.csv')

graph

fishing %>% 
  drop_na() %>% 
  group_by(year, lake) %>% 
  summarise(total_fish = sum(values)) %>% 
  ggplot(aes(x = year, y = total_fish, fill = lake)) +
  geom_stream() +
  scale_fill_manual(values = wes_palette("Darjeeling2")) +
  theme_minimal() 

saving the plot

ggsave("streamplot.png", device = "png")

Next steps

There are definitely some issues with this plot! For starters, the placement of drop_na() is going to eliminate data that we actually want to keep.

To see a great implementation of this, check out Eugen Buehler’s tweet:

And as pointed out by Christoph Nicault later in the thread, we need a filter() step before the sum, otherwise we’ll end up with duplicate data:

As always, #TidyTuesday Unfiltered is intended to get you started with a plot, but is never meant to be the final plot! I hope you’re able to take this code and resources and run with them to create something fantastic, and please tag me on Twitter when you do – I’d love to see what you create!

Posted on:
June 8, 2021
Length:
4 minute read, 653 words
Categories:
walkthrough reflection
Tags:
R streaming ggstream dataviz Great Lakes fish TidyTuesday
See Also:
A Slack-ers Guide to Twitch Streams
"Master" of string manipulation
But do you sleep?