R4DS (v1 & v2): A Retrospective


As all amazing opportunities in my life are wont to do, it started with a tweet:

…which resulted in me spending last Friday talking data science education with some of the great folks at JHU, largely centered on work happening with the Chromebook Data Science project.

It’s rare that I find myself dealing with imposter syndrome, but I did spend Thursday night eating all of my feelings of doubt and insecurity. It turns out that it only takes a couple bags of Skittles to snap your focus from “this has all got to be some sort of elaborate prank” to “that was too many bags of Skittles.”

As part of the invitation, Jeff had asked if I’d be able to give a presentation on my experiences with the earlier versions of the R for Data Science Online Learning Community. Being the kind of person who loves both public speaking and sharing narratives constructed from lived experiences, obviously I said yes. What follows is a written version of the talk.

The R4DS Online Learning Community (R4DS) has gone through several iterations, the first two of which I’ll discuss today. Currently the R4DS Online Learning Community can be described as a Slack group comprised of learners and mentors gathered together to provide support and guidance in learning R in a beginner-friendly, safe and supportive environment.

While I am no longer affiliated with R4DS, the split was amicable, of my own choosing, and done because I felt that my energy and contributions to the R community could best be made elsewhere. I am very proud of the work that continues to happen within the community, and thrilled that several individuals were willing to step up and steer the ship when it became time for me to step down.

Making the career transition to becoming a data scientist has changed my life by putting me in an otherwise unattainable income bracket, opening up a suite of opportunties (kids like me don’t often find themselves on Ivy League campuses unless we’re trespassing), and completely altering my career trajectory.

To make a very long story short, I grew up poor in an economically depressed area. I’ve been on public assistance more than once in my life, I know how to balance a checkbook to the penny in order to avoid overdraft fees, and I can turn $20 into a grocery budget for an entire month.

I put all of my eggs in the education basket a long time ago, and did so using the “education at any cost” model.

Translation: I have a metric fuckton of student loan debt.

When you’re the first person in your family to go to college and you don’t really have a lot of people in your life to help you navigate the unspoken rules of getting a college education, it’s pretty easy to buy into the fallacy that the more you spend on your education, the more you’ll earn when you’ve completed your education.

It’s a sobering reality to go through almost a decade of higher education only to realize your income may never be enough to cover the costs of your education. Unless, of course, you end up making a career transition that gives you access to an income bracket that would have been inaccessible otherwise. The kicker of the whole thing is that I largely taught myself data science - so the investment of less than $100 has had a disproportionately large effect on my life. Of course I want others to have that opportunity.

Tina Fey has this great quote about going “over, under, around, and through,” which is also how I’d describe my data scientist career path.

My first data science job was at PVP Live, a now defunct esports start-up. But going from unemployed to my first data science job wasn’t straightforward.

I was living in Seattle, unemployed, and playing at least 12 hours of World of Warcraft every day. Suffice to say if you’re playing that much WoW as a 30-something adult, you’re probably going through some things - and I was. But I started to realize that maybe this was my opportunity to completely change my career, and maybe that career involved video games.

I had heard about programming, and I knew video games needed to be programmed, so I enrolled in the “Introduction to Programming” course at the local community college. I failed out of the course twice.

Around the same time I had started a Twitter account, and was following various gaming companies when PVP Live had put out a call for community managers. Figuring this could be my in, I immediately applied and was onboarded on as a volunteer in charge of blogging about WoW and Hearthstone while also managing the company’s Twitter account.

At some point it came up that I was good at math and numbers, and was asked to interview for a position involving math and numbers (the position didn’t actually exist at the time I interviewed). After initial phone interviews I was given a performance task, which was essentially “here’s the League of Legends API. Show us what you can come up with in a week.”

Things I didn’t know at the time of this performance task:

  • what an API was
  • what League of Legends was
  • what the hell “show us what you can come up with” meant

Rather than ask questions of my potential future employers, I brushed off my Googling skills, downloaded a billion Excel spreadsheets, and eventually pulled together a pitch deck that was part “here’s what I’ve learned about League of Legends” and “please for the love of all that is holy give me a job.”

I was hired and asked to choose my job title. Clearly this kind of opportunity doesn’t come up often, so I said I was a “data scientist” and was hired part-time remote at what seemed like an absolutely luxurious salary (it wasn’t). To me this was the start - I had the job title, I knew I had enough basic data science skills to be successful, and I was confident that I could learn as I went along.

Within six months I was asked to join the company on-site in a full-time capacity, and on day one I was promoted to management. It turns out that when you’re in management you do a lot more meetings than data science, and so while I was sporadically working in R, I was hacking things together as needed.

Life happens, start-ups shut down, and sometimes you find yourself in Texas as an unemployed data scientist. But hey, it turns out that being able to work with data is a highly employable skill, and within weeks I was working at the Girl Scouts of Northeast Texas. My job title was “Outcomes Analyst,” but being able to recognize data science jobs in non-data scientist job titles is 90% of getting your first data science job.

I eventually hit a ceiling at the Girl Scouts, and about a year ago took a job with Teaching Trust*, where I’m currently employed. Within a month on the job I realized that I was going to have to get serious about learning data science in a more formal capacity.

To help keep myself accountable for improving both my R and data science skills, I started writing some pretty mediocre blog posts about using R on your PC and sharing links on Twitter when someone recommended the R for Data Science text by Hadley and Garrett. I had made a mental note to read the book post-haste when Hadley retweeted my semi-snarky comment about my R learning path.

Recognizing opportunities when they present themselves is important. This tweet is easily the most popular I’ve ever been on Twitter, and I had no idea if I would ever have this level of audience again. So even though I didn’t quite know what I was doing, I lobbed the idea of an “R for Data Science” online learning community out into the world.

The general idea was that this would be an online book club that used Slack to bring together learners and mentors to work through the R4DS text together, following a curriculum map that covered approximately two chapters every week for four months.

I genuinely expected to have around 25 people sign up - maybe 50, if I was lucky. But interest in the group soon grew beyond all expectations and over 600 people had signed up within two weeks. (Fun fact: if you’re starting a brand new Slack group, you can only invite 100 members on the unpaid plan. The solution to this is not to create a new Gmail account and send the link to everyone.)

What this meant for me was that I had to move from being a learner who was there to read the book with like-minded individuals to being a community manager. At the time I figured I could still learn while managing the community, but in reality I was spending 20+ hours a week answering emails, Twitter DMs, and creating resources for the community.

I taught high school in NYC. I have a master’s in education. I know a fair bit about best practices in education. And yet I threw all of that experience out the window when it came to managing the R4DS online learning community.

One of the biggest things I quickly realized is that we can forget about the gap between Excel and R - the gap between using your computer and using R/Rstudio your computer is very, very real. Two of the most common reasons people gave for dropping out of the community in the first few weeks were that Slack was overwhelming and/or they didn’t understand how to use RStudio.

There was also a mis-alignment of expectations. While I had thought I was clear that the group was an online book club where we were each responsible for our own learning, many showed up asking to be taught, didn’t know there was a book, and/or wanted to have strict learner - mentor assignments.

Mentors almost always gave answers, rather than guiding learners to answers, and at times mentors were not committed to using tidyverse solutions.

None of these are bad, or wrong, or offered as a condemnation of how individuals showed up to the group. Rather these are things that - had I taken the time and care in sharing my vision for the group - could have been prevented. That being said, I wouldn’t have known these would be things that needed to be addressed unless I had jumped into the deep end.

Community management is not easy. At the end of the first round I was burned out and hadn’t read the book, but still proud of the fact that I went for something before I was entirely ready. I also felt that one of the big successes of the group was the mentor-learner relationship. Education is relational, and having the opportunity to build and create connections and networks can be a great method for facilitating learning.

As far as I know only one person finished the entire curriculum. The biggest drop was in the first few weeks, related to the issues mentioned above. Those who were still active in the Slack group shared that the pace was faster than they had anticipated, and that once they fell behind, “catching up” felt too difficult.

Taking all of this into consideration, along with an evaluation of my own capacity, we entered into Round 2.

My biggest concern headed into Round 2 was my own ability to keep providing the level of care necessary to have a functioning group. I was facing increased responsibilities at work, and didn’t have as much time to dedicate to the R4DS endeavor, and so my thought was to open the group up to rolling admissions, commit to writing monthly challenges, and encouraging group members to drive their own initiatives.

We tried a lot of things - office hours, video tutorials, committing to commenting/asking questions, creating a logo, and building a website and Twitter profile. All of these were (and still are) community-driven efforts, although the one you’ve most likely heard of is #TidyTuesday.

While I loved everything that was happening in the group, it was transitioning away from an online book club focused on mastering content within a specific text, and evolving into more of a beginner-friendly place to get help with R.

I’ve spent a lot of time thinking about how I would do things differently if I had the capacity for a new iteration of the group. And while I haven’t thought through all of the details, I’d put diversity, equity, and inclusion as a driving principle of the group, so that it functions as a means for underrepresented populations to move into data science related careers.

Scalability and sustainability are important factors to consider, and I’d keep the group small, using cohorts of less than 100 people. Within the cohorts that would be one to two mentors and four to six learners clustered into a learning pod formed based on time zone, baseline skill proficiency, and timeline to complete the text.

How we train the people delivering data science education is important, and I’d like to see mentors undergo an application process, along with training in educational methods, ongoing assessment of effectiveness, and coaching to help mentors develop and grow their skills as data science educators.

The curriculum would absolutely need to be built out beyond “read these chapters and do the associated homework problems” to be more project-based, and leverage spiralized learning methods to increase understanding. In addition to data science education, the curriculum would also cover computer/technological fluency skills as well as metacognitive strategies for learning how to learn.

It’s a lot, yes, and I don’t know that I will have the capacity to build something like this in the near future. But we’re at a critical point in data science education, and I don’t know how much longer projects like this can wait.

You can find me on Twitter

  • The views expressed on this Website/blog/network are mine alone and do not necessarily represent the views of my employer.