Want to work with Data? Don’t wait.

Data is everywhere, and it seems you can’t go more than 48 hours without hearing how data scientists are going to rule the world–if only we can train enough of them.

The thing is, you don’t need to wait until after you get that Master’s (or PhD) in Data Science, or even until complete that online course that cost you several thousand dollars. You can start working with data now, and depending on your interests, skills, and commitment to learning, be job-ready in six months to a year.

With this in mind, I’d like to offer up my completely biased opinion on how to build the foundational skills you need to start moving towards a career in data.

I’ve never even worked with data before

That’s OK! There’s no time like the present to get started. As much as it pains me to say it, you’re going to want to get comfortable in Excel first. Why Excel? Because it’s quick, relatively easy, widely used, and once you go through the pain of learning data analysis in Excel you will have a whole new level of appreciation for R and/or Python.

So where do you start? Pick up the book Data Smart by John Foreman, who is currently the VP of Product Management at MailChimp. The book is engaging, insightful, and has you working directly with data in meaningful ways throughout the text.

I’m OK with Excel, so should I learn R or Python next?

This is a topic of much debate within the data science community–just do a Google search for “R vs. Python” and watch your computer melt.

My personal stance is that there’s no good answer, because you should eventually learn both. So try out both of them, see which one you gravitate towards, and dive in!

I’m OK with Excel, but ready for R

Obviously you’re welcome to hang out here and read the tutorials I (slowly) publish, and you can always reach out to me on Twitter. Speaking of Twitter, make sure you’re following the #rstats hashtag–there’s lots of good information being published daily!

One of the best beginner books I’ve found for R is R in Action, by Robert Kabacoff. Sure it’s a book, and so certain parts of it are going to be outdated or even obsolete by the time you get it, but that’s OK. You’re focused on building a foundation–you’re going to learn and grow the more you work with R, so don’t get distracted by the latest package releases or people who tell you that “=” is better than “<-”.

Update: since publishing this article, two additional texts have been mentioned to me that I cannot recommend enough:

If books really aren’t your thing, check out STAT 545. Jenny Bryan has published her entire syllabus, notes, code, and supplementary articles. Following along with this course at your own pace will singlehandedly improve your data analysis game in a matter of weeks.

I’m OK with Excel, but ready for Python

Yeah, you and me both. I don’t have a lot of good resources in learning Python for Data Science, because it’s still an area that I need to explore. That being said, I would highly recommend connecting with the Data Science Learning Club, an amazing, supportive, and friendly community created by BecomingDataSci.

What was that about not waiting?

You really don’t need to wait to start working on meaningful, real-world data. You should really go do data–now! If you don’t know where to start, see if there’s a small analytics project you can take on at work.

You can also check out Data for Democracy–they have a lot of work going on, and are a great place to start building up your experience.

Local non-profits also provide a great opportunity to hone your data science skills. Even if they aren’t hiring for an analytics staff member, try reaching out to their volunteer coordinator or analytics team to see if you can volunteer your data talents.