#SoDS18: Wrangling my Heart Out


I am absolutely in love with the Summer of Data Science 2018 (SoDS18), created by Data Science Renee! What better way to start crossing some projects off of my list than making a public declaration of intent, and sharing that process with literally anyone who will listen?

To the point

My goal is to write a series of functions in R that make the import and wrangling of STAAR data data a (more) manageable task than what I currently do. Oh, you want to know what I currently do? Mostly download a lot of Excel files and curse my lack of regex knowledge.

I can hear you now - why don’t you also collect all of these functions into a package and create a publicly accessible Shiny dashboard while you’re at it? And friends, yes, that is a great long term goal. But I need to be realistic about how much I can actually get done in the next few months.

But how? With what?

My primary resource will be the fantastic Advanced R by Hadley Wickham, but I know I’ll also be incorporating other texts, blog posts, and tutorials.

I anticipate spending approximately three to five hours a week reading, coding, and working through tutorials and toy problems. My goal is to also write a weekly learning summary to share what I’ve learned, what progress I’ve made, and to make my learning process as transparent, accessible, and replicable as possible. Seeing as this post has existed as a draft for the last two weeks, we might be in for some rough sledding.

Supplemental learning vs. the rabbit hole

One thing I hear often from data science learners is that they sit down to work through a tutorial, and three hours later they’ve opened 42 tabs, read nothing, and have yet to write a line of code. I feel you, fam.

I go down the rabbit hole when I’m either bored or faced with something that feels difficult. Going down the rabbit hole is a great way to procrastinate by doing something that inherently feels productive - but really isn’t. My approach for dealing with this is to employ as much self awareness as possible by asking myself the following questions whenever I find myself working outside of the Advanced R text:

  1. How am I feeling? Am I bored or struggling with the content?
  2. Is the information I’m about to pursue absolutely necessary for a foundational understanding of the concept I’m working on?
  3. What do I need to know, and what’s the most effective means of getting that information?

Mitigating my biggest challenge: time

Making time to participate in the Summer of Data Science 2018 is going to be a challenge for me. There’s been a lot of changes in both my personal and professional life - all amazing and wonderful things - but the resulting trajectory puts R firmly in the “hobby” category of my life right now, and in the hobby category R has some fierce competition.

Let’s go!

All that being said, my intention is to do the best that I can with the time that I have, with an emphasis on making my path easy to follow for anyone who may be interested.